Reputation: 25
I have a spark data frame, which contains one column of information. It looks like:
Name
----------
Bob
----------
Dan
I want to convert this into a single string, delimited by pipeline characters.
"Bob|Dan"
How would I go about doing so in Python (pyspark)? Currently, I'm creating the dataframe via
df = sqlContext.sql("Select name from db")
If you could help lead me in a certain direction, I'd appreciate it.
Upvotes: 2
Views: 2366
Reputation: 5870
You can use collect_list and concat from functions module,
>>> from pyspark.sql import functions as F
>>> df.select(F.concat_ws('|',F.collect_list(df.name)).alias('name')).show()
+-------+
| name|
+-------+
|Bob|Dan|
+-------+
Upvotes: 2
Reputation: 3739
Does this help?
df = sqlContext.createDataFrame([{'name':'Bob'},{'name':'Dan'}])
'|'.join([str(x.asDict().values()[0]) for x in df.collect()])
Upvotes: 2