Within Track & Know EU horizon 2020 project, the consortium partners have developed a big mobility data integrator (BMDI) to facilitate experts and stakeholders in advancing their big data handling, processing and decision-making activities. The integrator supports online data streams, heterogeneous, contextual and archival batch data (not only for mobility domain but can also support big data from other domains) as a fully featured industrial grade solution. BDMI interoperating with modern data storage technologies and support all important programming languages.
The platform of BDMI is consists of:
1. Data sources and Data store components, which represent data streams and data sources, both in a structured or unstructured format that can be made available and potentially be connected to the BDMI platform. The BMDI platform can efficiently interoperate with modern data storage technologies of a Big data ecosystem such as RDBMS, NoSQL, HDFS Hadoop, Apache HBASE, etc. as well as other persistence approaches such as Mongo, MySQL, JDBC, etc.
2. Connectors together with the Communication platform, that connect external data sources and make them available to the BDMI platform. External data sources are connected and made available by employing the “Stream Connectors” and “Data Source Connectors”. The underlying technologies and capabilities of the Streaming Component and the multiple workers of the Connect Component allow the realisation of scalable and secure stream data pipelines. The Communication Platform (i.e. Apache Kafka Cluster) allows users to publish and subscribe to streams of records (Topics) similar to the functionality provided by a message queue. The streams of records are stored in a fault-tolerant durable way and consumers (Big Data Apps) can process them as they occur.
3. Underlying Infrastructure. The underlying infrastructure spans multiple virtual machines (VMs) and provides all the necessary technologies and components that enable the storage and analysis of the data involved and further allowing the usage of any technology agnostic algorithms, by providing a distributed computing environment that enables the above. Including Apache Spark, Hadoop, Kafka Streams, Spark Streaming, etc.
4. Big Data Apps. Big data applications can be implemented in all important Big Data languages including Python, Java, R and Scala. Traditional programming approaches (C/C++, Ruby, Perl, PHP) can also be supported and efficiently interoperate with the BMDI platform. This is also referring to a variety of big data toolboxes which are being developed within the project such as big data processing (BDP) toolbox, big data analytics (BDA) toolbox, Complex event recognition (CER) toolbox and Visual analytics (VA) toolbox (More details on this can be seen from track and know website: https://trackandknowproject.eu/).
The platform also provides a graphical dashboard for administration and monitoring which visualise selected metrics of interest. The cluster overview is completed by Kafka Connect specific panels which present the user with running Connectors and tasks information in addition to data rates per worker node. The available dashboards allow not only a very good overview of the cluster performance and health status but also provide significant information when performance tuning is necessary.
Several components of the BDMI platform, are already tested with online data streams and other data sources for three pilots of the track & Know project (More details on Track & Know pilots can be seen from track and know website: https://trackandknowproject.eu/). Advance testing is underway to make this toolbox a marketable product.