Pablo Fernández
Pablo Fernández

Reputation: 81

Retrieve graphical information using Spark Structured Streaming

Spark Streaming provided a "Streaming" tab within the deployed Web UI (http://localhost:4040 for running applications or http://localhost:18080 for completed applications, both by default) for each application executed, where graphs representative of application performance could be obtained, which is no more available using Spark Structured Streaming. In my case, I am developing a streaming application with Spark Structured Streaming that reads from a Kafka broker and I would like to obtain a graph of records processed per second, such as the one I could obtain when using Spark Streaming instead of Spark Structured Streaming, among other graphical information.

What is the best alternative to achieve this? I am using Spark 3.0.1 (via pyspark library), and deploying my application on a YARN cluster.

I've checked Monitoring Structured Streaming Applications Using Web UI by Jacek Laskowski, but it is still not very clear how to obtain this type of information in a graphic way.

Thank you in advance!

Upvotes: 1

Views: 916

Answers (2)

Pablo Fernández
Pablo Fernández

Reputation: 81

I managed to get what I wanted. For some reason I still don't know, the Spark History Server UI for completed apps (on http://localhost:18080 by default) did not show the new tab ("Structured Streaming" tab) that is available for Spark Structured Streaming applications that are executed on Spark 3.0.1. However, the web UI that I managed to access through the URL http://localhost:4040 does show me the information that I wanted to retrieve. You just need to click on the 'runId' link of the streaming query from which you want to get the statistics.

Spark Structured Streaming app Web UI on port 4040

If you can't see this tab, based on my personal experience, I recommend the following:

  • Upgrade to Spark latest version (currently 3.0.1)
  • Consult this information on the UI deployed at port 4040 while the application is running, instead of port 18080 when the application has finished.

I found the Web UI official documentation from latest Apache Spark very useful to achieve this.

Upvotes: 1

maxime G
maxime G

Reputation: 1771

Most metrics informations you see in spark UI is exported by spark.

If spark UI don't fit your requirement, you could retrieve theses metrics and process it.

you can use a sink to export the data, for exemple to csv, prometheus, ... or via rest API.

you should take a look at spark monitoring : https://spark.apache.org/docs/latest/monitoring.html

Upvotes: 0

Related Questions