Stream

Project Mission

A growing number of applications requires the ability to analyze massive amounts of streaming data in real-time. Examples of such applications are: market data feed processing of the output of large scale ad-hoc networks, etc. The project aims at providing a highly scalable cloud computing platform to enable a new breed of services. The core is the data streaming platform, StreamCloud, that will be able to parallelize the processing of information flows in large clusters of 100s sites. Current approaches fail to scale for massive information flows. Stream aims at boosting the scalability of current approaches in 1 to 2 orders of magnitude. Stream platform will provide elastic computing, so the computing resources as used as required by the incoming load. Below the core, there is a high performance communication layer that enables an efficient interaction among sites with access between node memories of tens of microseconds in contrast with tens of milliseconds using current technology. Additionally, this layer will provide parallel IO and low cost storage for huge amounts of information. Above the core, there is a data mining layer offering higher level services to ease the development of applications processing the information flows. On top of the data streaming platform there is the application layer in which user applications & services will run.

Stream Contributions

  • A cloud computing platform, StreamCloud, for providing real-time services over massive data flows characterized by:

    • Expressivity: Providing the same expressiveness as state of the art data streaming engines and suitable for a wide range of applications, such as e-mails, IP-packets, stock quotes, etc.
    • Scalability: Scaling in the data stream volume in addition to scaling in the number of queries and/or operators and able to scale to 100s of nodes.
    • Elasticity: Growing and shrinking the number of nodes as needed to cope with the incoming load and minimizing the used resources.

    • Availability: Absence of any single point of failure; low cost.

  • A communication and storage infrastructure characterized by:

    • Low latency and high throughput network communication.

    • High throughput storage able to store streaming data at network rates.

  • A data mining platform, StreamMine, characterized by:

    • Able to perform online and real-time data analysis in combination with the data streaming platform.

  • Real-life demonstrators in three important business domains: telecommunications, finance applications and e-services.

 

Stream
Extended Name: 
Scalable Autonomic Streaming Middleware for Real Time Processing of Massive Data Flows

Project Actions