Reputation: 282
I am working with pyspark below is my dataframe
col_list | col_1| col_2 |...|col_n
col_1,col2 | x | y |...|n
col_2,col_n| a | b |...|z
I want concat columns which are present in my col_list column values
Expected DF:
col_list | col_1| col_2 |...|col_n | concat_cols
col_1,col2 | x | y |...|n | [col_1:x,col_2:y]
col_2,col_n| a | b |...|z | [col_2:b,col_n:z]
Trying with below approaches:
Creating udf to pass col_list and create expression to create array col like array(df['col_1'],df['col_2']) , but this will create column with string , not sure how to execute expression
Using create map passing col_list in loop and create column, need to check on this.
Upvotes: 0
Views: 369
Reputation: 42342
You can use some map methods:
import pyspark.sql.functions as F
df2 = df.withColumn(
'col_map',
F.map_from_arrays(
F.array([F.lit(c) for c in df.columns[1:]]),
F.array(df.columns[1:])
)
).withColumn(
'concat_cols',
F.expr("""
map_from_arrays(
split(col_list, ','),
transform(split(col_list, ','), x -> col_map[x])
)
""")
).drop('col_map')
df2.show(truncate=False)
+-----------+-----+-----+-----+------------------------+
|col_list |col_1|col_2|col_n|concat_cols |
+-----------+-----+-----+-----+------------------------+
|col_1,col_2|x |y |n |[col_1 -> x, col_2 -> y]|
|col_2,col_n|a |b |z |[col_2 -> b, col_n -> z]|
+-----------+-----+-----+-----+------------------------+
If you prefer a string representation rather than a map type column, you can do
import pyspark.sql.functions as F
df2 = df.withColumn(
'col_map',
F.map_from_arrays(
F.array([F.lit(c) for c in df.columns[1:]]),
F.array(df.columns[1:])
)
).withColumn(
'concat_cols',
F.expr("""
concat_ws(',',
transform(split(col_list, ','), x -> concat_ws(':', x, col_map[x]))
)
""")
).drop('col_map')
df2.show(truncate=False)
+-----------+-----+-----+-----+---------------+
|col_list |col_1|col_2|col_n|concat_cols |
+-----------+-----+-----+-----+---------------+
|col_1,col_2|x |y |n |col_1:x,col_2:y|
|col_2,col_n|a |b |z |col_2:b,col_n:z|
+-----------+-----+-----+-----+---------------+
Upvotes: 2