Anos
Anos

Reputation: 57

How to merge csv files into single parquet file inside a folder in pyspark?

I want to merge three csv files into single parquet file using pyspark.

Below mentioned is my S3 path,10th date folder having three files, I want merge those files into a single file as parquet

"s3://lla.raw.dev/data/shared/sap/orders/2022/09/10/orders1.csv,orders2.csv,orders3.csv"

Single file

"s3://lla.raw.dev/data/shared/sap/orders/parquet file

Upvotes: 0

Views: 722

Answers (1)

pltc
pltc

Reputation: 6082

Just read from CSVs and write to parquet

(spark
    # read from CSV
    .read.csv('s3://lla.raw.dev/data/shared/sap/orders/2022/09/10/')
    
    # turn to single file
    .coalesce(1)
 
    # write to parquet
    .write
    .parquet('s3://lla.raw.dev/data/shared/sap/orders/parquet')
)

Upvotes: 1

Related Questions