Coming from various projects in the big data landscape of German companies I realize that many of the companies are facing similar problems. They have old legacy systems in place that are very good at what they where designed and built for at that time. Today, new technologies arise and new things become possible. People are talking about stream processing and real-time data processing. Everyone wants to adopt to new technologies to invest in the future. Even though I personally think this is a reasonable thought, I am also convinced that one has to understand these technologies first and what they were intended to be used for. As I was working with various clients I realized that it’s not too easy to define in a clear way what stream processing is and what use-cases are that we can leverage it for. Therefore, in this series of articles I will share some thoughts of mine and we will elaborate these two approaches to understand better what they are. In this first article I will try to give a clear distinction between the well-known batch processing and stream processing.
The relational database model is probably the most mature and adopted form of database. I guess that almost all companies have it in place as a central component to implement their operational business. Usually a copy of this central data store exists to be used for analytical use cases. The replication and the analytics queries, however, usually run as so called batch jobs. Even though most of us might have an extensive background in relational databases, I would like to start the article with a clear definition of the characteristics of a batch job.