Anish Manocha
Anish Manocha

Reputation: 25

Converting one column spark dataframe into Single String delimited by pipline character in Python

I have a spark data frame, which contains one column of information. It looks like:

Name
----------
Bob


----------
Dan

I want to convert this into a single string, delimited by pipeline characters.

"Bob|Dan"

How would I go about doing so in Python (pyspark)? Currently, I'm creating the dataframe via

df = sqlContext.sql("Select name from db")

If you could help lead me in a certain direction, I'd appreciate it.

Upvotes: 2

Views: 2366

Answers (2)

Suresh
Suresh

Reputation: 5870

You can use collect_list and concat from functions module,

>>> from pyspark.sql import functions as F
>>> df.select(F.concat_ws('|',F.collect_list(df.name)).alias('name')).show()
+-------+
|   name|
+-------+
|Bob|Dan|
+-------+

Upvotes: 2

Ezer K
Ezer K

Reputation: 3739

Does this help?

df = sqlContext.createDataFrame([{'name':'Bob'},{'name':'Dan'}])

'|'.join([str(x.asDict().values()[0])  for x in df.collect()])

Upvotes: 2

Related Questions