The overall software architecture of the I-BiDaaS project is shown below and contains a classical lambda-architecture, meaning a streaming and batch layer that together analyze process and also store the data. The batch layer is shown in red and brown and the streaming layer consists of the green parts. The former was described in a previous blog post; here we concentrate on the challenges of the streaming layer.
The I-BiDaaS platform can handle large amounts of data that are coming from many different sources, e.g. sensors, databases, blogs, services and also from IBM’s Test Data Fabrication platform. One can also upload historical data for analysis. The result of such analysis will lead to rules or models that can analyze new data in real-time or nearly real-time. These are normally written in Python or R using well-known libraries like scikit-learn, Tensorflow or keres, however, one can also write own code or use PMML.
For I-BiDaaS we have chosen as Streaming Analytics software Apama by Software AG (SAG) which is part of a leading Industrial Internet of Things (IIoT) platform, a community edition is available as well. The real-time data is fed in through a message broker and the analytics software will call the pre-defined rules and models at runtime. The data can also be enhanced by other means, e.g. by calling web services or query databases.
When the data analysis task is too time-consuming to react to anomalies in the desired time, the I-BiDaaS platform will offload these tasks to an array of GPUs. The data flow is sketched below and uses an in-memory data storage for faster data exchange. We have already seen an improvement by 50%.