Pyspark - Export DataFrame to Text

Question

I'm working in some Pyspark tasks.

I am using a parquet file as source with 3 columns.

One of them it requires to export my dataframe to a text file with tab delimited. I can do this using the following operation:

`df.write.option("text").csv("output_file"`)

However, it exports a csv file not a text file. The only way that I was able to see to export a text file was to export only a single column but with that option I loose the delimiter part. For exemple:

df = df.select(concat_aws('	',*result.columns).alias('data'))

What is the more similar way to export the text file with delimiters like I did for CSV export? For example, in Scala this is very simple to do:

df.map(row => row.mkString("	")).write.text("")

Is there any equivalence on Python?

Thanks!

Czaporka · Accepted Answer

Your attempt with the csv method was almost correct, you only need to change the delimiter from the default (comma) to tab:

df.write.option("sep", "	").csv("output_file")

Note that CSV is actually a text format (you can view it with a text editor; it contains tabular data where rows are separated by new line characters, and fields are separated by commas). The tab-delimited variation of it is sometimes called TSV.

Pyspark - Export DataFrame to Text

Answers (1)

Related Questions