Reputation: 9
I have got a requirement to show the management/ Client that the executor-memory, number of cores, default parallelism, number of shuffle partitions and other configuration properties for running the spark job are not excessive or more than required. I need a monitoring (with visualization) tool by which I can justify the memory usage in the spark job. Additionally it should give the kind of information like memory is not getting used properly or certain job requires more memory.
Please suggest some application or tool.
Upvotes: 0
Views: 164
Reputation: 14939
LinkedIn has created a tool that sounds very similar to what you're looking for
See for a presentation as an overview of that product https://youtu.be/7KjnjwgZN7A?t=480
LinkedIn team has open-sourced Dr. Elephant here - https://github.com/linkedin/dr-elephant
Give it a try. Notice that this setup may require manual tweaking of Spark History Server as part of initial integration setup to get the information that Dr. Elephant requires.
Upvotes: 1