Reputation: 73
As of Spark 2.2.0, there's are new endpoints in the API for getting information about streaming jobs.
I run Spark on EMR clusters, using Spark 2.2.0 in cluster mode.
When I hit the endpoint for my streaming jobs, all it gives me is the error message:
no streaming listener attached to <stream name>
I've dug through the Spark codebase a bit, but this feature is not very well documented. So I'm curious if this is a bug? Is there some configuration I need to do to get this endpoint working?
This appears to be an issue specifically when running on the cluster. The same code running on Spark 2.2.0 on my local machine shows the statistics as expected, but gives that error message when run on the cluster.
Upvotes: 2
Views: 3126
Reputation: 19
If you look at the output of PyCharm in the console window it will show what port it used streaming on. I was assuming it was 4040 but when i checked the output carefully the port was on 4041. Here is the output:
WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Then you can use localhost:4041 on any web browser and you should see the streaming output. Hope this helps!
Upvotes: 0
Reputation: 15222
TL;DR Just go to: http://localhost:4040/streaming
Had a same issue. I ran spark application from Pycharm Python virtual environment. Spark reported that port 4040 was taken:
Spark context Web UI available at http://192.168.100.221:4042
but I saw no jobs there and Streaming tab missing. Then I went to http://localhost:4040/streaming and behold, everything was there.
Upvotes: 0
Reputation: 74759
I'm using the very latest Spark 2.3.0-SNAPSHOT built today from the master so YMMV. It worked fine.
Is there some configuration I need to do to get this endpoint working?
No. It's supposed to work fine with no changes to the default configuration.
Make sure the you use the host and port of the driver (as rumors are that you could also access 18080
of Spark History Server that does show all the same endpoints, and the same jobs running, but no streaming listener attached).
As you can see in the source code where the error message lives it can happen only when ui.getStreamingJobProgressListener
has not been registered (that ends up in case None
).
So the question now is why would that SparkListener
not be registered?
That leads us to the streamingJobProgressListener var that is set using setStreamingJobProgressListener method exclusively while StreamingTab
is being instantiated (which was the reason why I asked you if you can see the Streaming tab).
In other words, if you see the Streaming tab in web UI, you have the streaming metric endpoint(s) available. Check the URL to the endpoint which should be in the format:
http://[driverHost]:[port]/api/v1/applications/[appId]/streaming/statistics
I tried to reproduce your case and did the following that led me to a working case.
Started one of the official examples of Spark Streaming applications.
$ ./bin/run-example streaming.StatefulNetworkWordCount localhost 9999
I did run nc -lk 9999
first.
Opened the web UI @ http://localhost:4040/streaming to make sure the Streaming tab is there.
Made sure http://localhost:4040/api/v1/applications/ responds with application ids.
$ http http://localhost:4040/api/v1/applications/
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 266
Content-Type: application/json
Date: Wed, 13 Dec 2017 07:58:04 GMT
Server: Jetty(9.3.z-SNAPSHOT)
Vary: Accept-Encoding, User-Agent
[
{
"attempts": [
{
"appSparkVersion": "2.3.0-SNAPSHOT",
"completed": false,
"duration": 0,
"endTime": "1969-12-31T23:59:59.999GMT",
"endTimeEpoch": -1,
"lastUpdated": "2017-12-13T07:53:53.751GMT",
"lastUpdatedEpoch": 1513151633751,
"sparkUser": "jacek",
"startTime": "2017-12-13T07:53:53.751GMT",
"startTimeEpoch": 1513151633751
}
],
"id": "local-1513151634282",
"name": "StatefulNetworkWordCount"
}
]
Accessed the endpoint for the Spark Streaming application @ http://localhost:4040/api/v1/applications/local-1513151634282/streaming/statistics.
$ http http://localhost:4040/api/v1/applications/local-1513151634282/streaming/statistics
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 219
Content-Type: application/json
Date: Wed, 13 Dec 2017 08:00:10 GMT
Server: Jetty(9.3.z-SNAPSHOT)
Vary: Accept-Encoding, User-Agent
{
"avgInputRate": 0.0,
"avgProcessingTime": 30,
"avgSchedulingDelay": 0,
"avgTotalDelay": 30,
"batchDuration": 1000,
"numActiveBatches": 0,
"numActiveReceivers": 1,
"numInactiveReceivers": 0,
"numProcessedRecords": 0,
"numReceivedRecords": 0,
"numReceivers": 1,
"numRetainedCompletedBatches": 376,
"numTotalCompletedBatches": 376,
"startTime": "2017-12-13T07:53:54.921GMT"
}
Upvotes: 1