RKC
RKC

Reputation: 27

How to concatenate data frame column pyspark?

I have created data frame using below code:

df = spark.createDataFrame([("A", "20"), ("B", "30"), ("D", "80"),("A", "120"),("c", "20"),("Null", "20")],["Let", "Num"])

df.show()
+----+---+
| Let|Num|
+----+---+
|   A| 20|
|   B| 30|
|   D| 80|
|   A|120|
|   c| 20|
|Null| 20|
+----+---+

I want create data frame like below:

+----+-------+
| Let|Num    |
+----+-------+
|   A| 20,120|
|   B| 30    |
|   D| 80    |
|   c| 20    |
|Null| 20    |
+----+-------+

how to achieve this?

Upvotes: 0

Views: 50

Answers (1)

koiralo
koiralo

Reputation: 23119

You can groupBy Let and collect as list with collect_list

from pyspark.sql import functions as F

df.groupBy("Let").agg(F.collect_list("Num")).show()

Output as List:

+----+-----------------+
| Let|collect_list(Num)|
+----+-----------------+
|   B|             [30]|
|   D|             [80]|
|   A|        [20, 120]|
|   c|             [20]|
|Null|             [20]|
+----+-----------------+



df.groupBy("Let").agg(F.concat_ws(",", F.collect_list("Num"))).show()

Output as String

+----+-------------------------------+
| Let|concat_ws(,, collect_list(Num))|
+----+-------------------------------+
|   B|                             30|
|   D|                             80|
|   A|                         20,120|
|   c|                             20|
|Null|                             20|
+----+-------------------------------+

Upvotes: 1

Related Questions