Dina Dinesh
Dina Dinesh

Reputation: 23

How spark process XML files?

How spark process XML files in distributed manner? XML file is not splittable file right? Will it be processed only by a single node? I'm little bit confused, It would be helpful if someone help me on this query. Thanks in advance

Upvotes: 0

Views: 160

Answers (1)

Kevin Wu
Kevin Wu

Reputation: 21

I came across the same question from the recent use case/development using Spark. From my observation of the Spark Web UI, it seems like an XML file is not splittable indeed but the transformation (read/parse..etc) seems to be handled by multiple nodes in a distributed manner. My summary is that assuming you have 100 XML files to read and process, and you have 10 nodes, then you can only process 10 files at a time and move on to the next multiple of 10. (10 -> 20 -> 30.. 100).

Upvotes: 1

Related Questions