Ron
Ron

Reputation: 207

Pivot and Concatenate columns in pyspark dataframe

I have this dataframe below, and I need to get basically one row with all the marks fields concatenated with a delimiter like pipe.
So: PACKAGING MARKS 3|PACKAGING MARKS 2|PACKAG.....

And there can be varying amounts of marks records for each mid.

mid marksId id index marks
2 3 3 2 PACKAGING MARKS 3
2 3 3 1 PACKAGING MARKS 2
2 3 3 0 PACKAGING MARKS 1
2 4 4 2 PACKAGING MARKS 23
2 4 4 1 PACKAGING MARKS 22
2 4 4 0 PACKAGING MARKS 21

Thanks

Upvotes: 1

Views: 612

Answers (1)

bzu
bzu

Reputation: 1594

Assuming you want 1 delimited string for each "mid", you can collect all "marks" with collect_list() and use concat_ws() to create the string:

import pyspark.sql.functions as F

df.groupby('mid').agg(F.concat_ws('|', F.collect_list('marks')).alias('marks_str')).show(truncate=False)

Upvotes: 1

Related Questions