Charbel
Charbel

Reputation: 73

Concatenate column names in a new column based on their values

id col1 col2 ones twos
1 0 1 col2
2 1 2 col1 col2
3 0 1 col2
4 1 1 col1, col2
5 2 2 col1,col2
6 0 1 col2

I need to collect the column names containing the 1s and 2s into one column each.

Upvotes: 1

Views: 706

Answers (1)

ZygD
ZygD

Reputation: 24356

concat_ws with a smart when condition may do it.

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [(1, 0, 1),
     (2, 1, 2),
     (3, 0, 1),
     (4, 1, 1),
     (5, 2, 2),
     (6, 0, 1)],
    ['id', 'col1', 'col2']
)
df = df.select(
    '*',
    F.concat_ws(', ', *[F.when(F.col(c) == 1, c) for c in {'col1', 'col2'}]).alias('ones'),
    F.concat_ws(', ', *[F.when(F.col(c) == 2, c) for c in {'col1', 'col2'}]).alias('twos'),
)
df.show()
# +---+----+----+----------+----------+
# | id|col1|col2|      ones|      twos|
# +---+----+----+----------+----------+
# |  1|   0|   1|      col2|          |
# |  2|   1|   2|      col1|      col2|
# |  3|   0|   1|      col2|          |
# |  4|   1|   1|col1, col2|          |
# |  5|   2|   2|          |col1, col2|
# |  6|   0|   1|      col2|          |
# +---+----+----+----------+----------+

Upvotes: 2

Related Questions