saveAsTextFile to s3 on spark does not work, just hangs

Question

I am loading a csv text file from s3 into spark, filtering and mapping the records and writing the result to s3.

I have tried several input sizes: 100k rows, 1M rows & 3.5M rows. The former two finish successfully while the latter (3.5M rows) hangs in some weird state in which the job stages monitor web app (the one in port 4040) stops , and the command line console gets stuck and does not even respond to ctrl-c. The Master's web monitoring app still responds and shows the state as FINISHED.

In s3, I see an empty directory with a single zero-sized entry _temporary_$folder$. The s3 url is given using the s3n:// protocol.

I did not see any error in the logs in the web console. I also tried several cluster sizes (1 master + 1 worker, 1 master + 5 workers) and got to the same state.

Has anyone encountered such an issue? Any idea what's going on?

saveAsTextFile to s3 on spark does not work, just hangs

Answers (1)

Related Questions