Publikation: An elastic and traffic-aware scheduler ...
Stammdaten
Titel: | An elastic and traffic-aware scheduler for distributed data stream processing in heterogeneous clusters |
Untertitel: | |
Kurzfassung: | Existing Data Stream Processing (DSP) systems perform poorly while encountering heavy workloads, particularly on clustered set of (heterogeneous) computers. Elasticity and changing application parallelism degree can limit the performance degradation in the face of varying workloads that negatively impact the overall application response time. Elasticity can be achieved by operator scaling, i.e., by replication and relocation in operators at runtime. However, scaling decisions at runtime is challenging, since it first increases the overall communication overhead between operators and secondly changes any initial scheduling that could lead to a non-optimal scheduling plan. In this paper, we investigate the problem of elasticity and scaling decisions and propose a DSP system called ER-Storm. To curb communication overhead, we propose a new 3-step mechanism for replication and relocation of operators upon detecting a bottleneck operator that overutilizes a worker node. The other challenge is to select the proper worker nodes to host relocated operators. By discretizing the input workload, we model the relocation of operators between worker nodes at runtime through a scalable Markov Decision Process (MDP) and use a model-free notion of reinforcement learning (Q-Learning) to find optimal solutions. We have implemented our propositions on the Apache Storm version 2.1.0. Our experimental results show that ER-Storm reduces the average topology response time by 20–60 percent based on the rate of input workload (low or high) compared to the R-Storm scheduler and the Online-Scheduler of Storm. |
Schlagworte: | Distributed data stream processing, Elasticity, Scheduling, Resource-awareness, Heterogeneous clusters, Reinforcement learning, Apache storm |
Publikationstyp: | Beitrag in Zeitschrift (Autorenschaft) |
Erscheinungsdatum: | 11.07.2022 (Online) |
Erschienen in: |
Journal of Supercomputing
Journal of Supercomputing
(
Springer;
)
zur Publikation |
Titel der Serie: | - |
Bandnummer: | - |
Heftnummer: | - |
Erstveröffentlichung: | Ja |
Version: | - |
Seite: | S. 1 - 38 |
Versionen
Keine Version vorhanden |
Erscheinungsdatum: | 07.2022 |
ISBN: | - |
ISSN: | 0920-8542 |
Homepage: | https://link.springer.com/article/10.1007/s11227-022-04669-z |
Erscheinungsdatum: | 11.07.2022 |
ISBN (e-book): | - |
eISSN: | 1573-0484 |
DOI: | http://dx.doi.org/10.1007/s11227-022-04669-z |
Homepage: | https://link.springer.com/article/10.1007/s11227-022-04669-z |
Open Access |
|
AutorInnen
Hamid Hadian (intern) |
Mohammadreza Farrokh (extern) |
Mohsen Sharifi (extern) |
Ali Jafari (extern) |
Zuordnung
Organisation | Adresse | ||||
---|---|---|---|---|---|
Fakultät für Technische Wissenschaften
Institut für Informationstechnologie
|
AT - 9020 Klagenfurt am Wörthersee |
Kategorisierung
Sachgebiete | |
Forschungscluster | Kein Forschungscluster ausgewählt |
Zitationsindex |
Informationen zum Zitationsindex: Master Journal List
|
Peer Reviewed |
|
Publikationsfokus |
Klassifikationsraster der zugeordneten Organisationseinheiten:
|
Arbeitsgruppen |
|
Kooperationen
Organisation | Adresse | ||
---|---|---|---|
University of Tehran
|
IR
Teheran |
Forschungsaktivitäten
(Achtung: Externe Aktivitäten werden im Suchergebnis nicht mitangezeigt)
Projekte: |
|
Publikationen: | Keine verknüpften Publikationen vorhanden |
Veranstaltungen: | Keine verknüpften Veranstaltung vorhanden |
Vorträge: | Keine verknüpften Vorträge vorhanden |