Reputation: 57
I want to merge three csv files into single parquet file using pyspark.
Below mentioned is my S3 path,10th date folder having three files, I want merge those files into a single file as parquet
"s3://lla.raw.dev/data/shared/sap/orders/2022/09/10/orders1.csv,orders2.csv,orders3.csv"
Single file
"s3://lla.raw.dev/data/shared/sap/orders/parquet file
Upvotes: 0
Views: 722
Reputation: 6082
Just read from CSVs and write to parquet
(spark
# read from CSV
.read.csv('s3://lla.raw.dev/data/shared/sap/orders/2022/09/10/')
# turn to single file
.coalesce(1)
# write to parquet
.write
.parquet('s3://lla.raw.dev/data/shared/sap/orders/parquet')
)
Upvotes: 1