Stammdaten

Titel: An elastic and traffic-aware scheduler for distributed data stream processing in heterogeneous clusters
Untertitel:
Kurzfassung:

Existing Data Stream Processing (DSP) systems perform poorly while encountering heavy workloads, particularly on clustered set of (heterogeneous) computers. Elasticity and changing application parallelism degree can limit the performance degradation in the face of varying workloads that negatively impact the overall application response time. Elasticity can be achieved by operator scaling, i.e., by replication and relocation in operators at runtime. However, scaling decisions at runtime is challenging, since it first increases the overall communication overhead between operators and secondly changes any initial scheduling that could lead to a non-optimal scheduling plan. In this paper, we investigate the problem of elasticity and scaling decisions and propose a DSP system called ER-Storm. To curb communication overhead, we propose a new 3-step mechanism for replication and relocation of operators upon detecting a bottleneck operator that overutilizes a worker node. The other challenge is to select the proper worker nodes to host relocated operators. By discretizing the input workload, we model the relocation of operators between worker nodes at runtime through a scalable Markov Decision Process (MDP) and use a model-free notion of reinforcement learning (Q-Learning) to find optimal solutions. We have implemented our propositions on the Apache Storm version 2.1.0. Our experimental results show that ER-Storm reduces the average topology response time by 20–60 percent based on the rate of input workload (low or high) compared to the R-Storm scheduler and the Online-Scheduler of Storm.

Schlagworte: Distributed data stream processing, Elasticity, Scheduling, Resource-awareness, Heterogeneous clusters, Reinforcement learning, Apache storm
Publikationstyp: Beitrag in Zeitschrift (Autorenschaft)
Erscheinungsdatum: 11.07.2022 (Online)
Erschienen in: Journal of Supercomputing
Journal of Supercomputing
zur Publikation
 ( Springer; )
Titel der Serie: -
Bandnummer: -
Heftnummer: -
Erstveröffentlichung: Ja
Version: -
Seite: S. 1 - 38

Versionen

Keine Version vorhanden
Erscheinungsdatum: 07.2022
ISBN: -
ISSN: 0920-8542
Homepage: https://link.springer.com/article/10.1007/s11227-022-04669-z
Erscheinungsdatum: 11.07.2022
ISBN (e-book): -
eISSN: 1573-0484
DOI: http://dx.doi.org/10.1007/s11227-022-04669-z
Homepage: https://link.springer.com/article/10.1007/s11227-022-04669-z
Open Access
  • Online verfügbar (Open Access)

Zuordnung

Organisation Adresse
Fakultät für Technische Wissenschaften
 
Institut für Informationstechnologie
Universitaetsstr. 65-67
9020 Klagenfurt am Wörthersee
Österreich
   martina.steinbacher@aau.at
http://itec.aau.at/
zur Organisation
Universitaetsstr. 65-67
AT - 9020  Klagenfurt am Wörthersee

Kategorisierung

Sachgebiete
  • 1020 - Informatik
Forschungscluster Kein Forschungscluster ausgewählt
Zitationsindex
  • Science Citation Index Expanded (SCI Expanded)
Informationen zum Zitationsindex: Master Journal List
Peer Reviewed
  • Ja
Publikationsfokus
  • Science to Science (Qualitätsindikator: II)
Klassifikationsraster der zugeordneten Organisationseinheiten:
Arbeitsgruppen
  • Distributed Multimedia Systems

Kooperationen

Organisation Adresse
University of Tehran
Teheran
Iran, Islamische Republik
IR  Teheran

Beiträge der Publikation

Keine verknüpften Publikationen vorhanden