Pedro Alves
Pedro Alves

Reputation: 1054

Pyspark - Export DataFrame to Text

I'm working in some Pyspark tasks.

I am using a parquet file as source with 3 columns.

One of them it requires to export my dataframe to a text file with tab delimited. I can do this using the following operation:

`df.write.option("text").csv("output_file"`)

However, it exports a csv file not a text file. The only way that I was able to see to export a text file was to export only a single column but with that option I loose the delimiter part. For exemple:

df = df.select(concat_aws('\t',*result.columns).alias('data'))

What is the more similar way to export the text file with delimiters like I did for CSV export? For example, in Scala this is very simple to do:

df.map(row => row.mkString("\t")).write.text("")

Is there any equivalence on Python?

Thanks!

Upvotes: 0

Views: 1498

Answers (1)

Czaporka
Czaporka

Reputation: 2407

Your attempt with the csv method was almost correct, you only need to change the delimiter from the default (comma) to tab:

df.write.option("sep", "\t").csv("output_file")

Note that CSV is actually a text format (you can view it with a text editor; it contains tabular data where rows are separated by new line characters, and fields are separated by commas). The tab-delimited variation of it is sometimes called TSV.

Upvotes: 2

Related Questions