am10
am10

Reputation: 499

Reading Csv file written by Dataframewriter Pyspark

I was having dataframe which I wrote to a CSV by using below code:

df.write.format("csv").save(base_path+"avg.csv")

As i am running spark in client mode, above snippets created a folder name avg.csv and the folder contains some file with part-* .csv on my worker node or nested folder then file part-*.csv.

Now when I am trying to read avg.csv I am getting path doesn't exist.

df.read.format("com.databricks.spark.csv").load(base_path+"avg.csv")

Can anybody tell where am I doing wrong ?

Upvotes: 1

Views: 197

Answers (1)

Jim Todd
Jim Todd

Reputation: 1588

Part-00** files are output of distributively computed files (like MR, spark). So, it will be always a folder created with part files when you try to store, as this is an output of some distributed storage which is to be kept in mind.

So, try using:

df.read.format("com.databricks.spark.csv").load(base_path+"avg.csv/*")

Upvotes: 2

Related Questions