formicaman
formicaman

Reputation: 1357

Pyspark only writing '_temporary' folder when writing parquet

I am trying to write a pyspark df to parquet like this:

df.write.format("parquet").\
mode('overwrite').\
save('gs://my_bucket/my_folder/filename')

This data frame has rows in the millions but I have been able to write a similar data frame before in a few minutes. However, this takes 30+ minutes, and I can only see _temporary/0/ under it, with nothing else.

I am able to easily write a small data frame and see that it works, but for some reason this one does not. There doesn't appear to be anything wrong with the data frame.

Could there be any other reason besides an issue with the data frame as to why it is taking forever and nothing is being written? Other similarly-sized data frames have had no issues.

Upvotes: 0

Views: 2092

Answers (1)

Arran Duff
Arran Duff

Reputation: 1474

  • Your files won't appear until the spark job is completed
  • Once your job has completed successfully you will see the files
  • This is explained here Spark _temporary creation reason
  • You may be able to see your final files being created inside the _temporary directory before they get moved to their final destination
  • However, remember that spark must complete all tasks in a stage before moving to the next stage. If one of your tasks gets stuck in a stage before the write stage, it may appear that your job has frozen and you will not see any files being written.
  • Your best bet for debugging this is to use the spark UI. It will provide nice visuals on the progress of all your tasks through the stages
  • The most common reason for tasks getting stuck is partition skew - where one task is doing much more work than the other tasks and therefore taking much longer to complete. But there are also other reasons why your job may apear frozen. Again the spark UI is really the best/only way to get a good understanding of how your job is progressing
  • In any event Spark UI is always helpful to understand bottlenecks or stalled jobs

Upvotes: 1

Related Questions