Reputation: 1357
I am trying to write a pyspark df to parquet like this:
df.write.format("parquet").\
mode('overwrite').\
save('gs://my_bucket/my_folder/filename')
This data frame has rows in the millions but I have been able to write a similar data frame before in a few minutes. However, this takes 30+ minutes, and I can only see _temporary/0/
under it, with nothing else.
I am able to easily write a small data frame and see that it works, but for some reason this one does not. There doesn't appear to be anything wrong with the data frame.
Could there be any other reason besides an issue with the data frame as to why it is taking forever and nothing is being written? Other similarly-sized data frames have had no issues.
Upvotes: 0
Views: 2092
Reputation: 1474
Upvotes: 1