Rafa
Rafa

Reputation: 527

How to dynamically create a struct column from a list of column names?

I have a dataframe with 100's of columns:

root
 |-- column1
 |-- column2
 |-- column3
 |-- column4
 |-- column5

I have a list of the column names:

struct_list = ['column4','column3','column2']

Expected Schema:

root
 |-- column1
 |-- column2
 |-- column3
 |-- column4
 |-- column5
 |-- prev_val 
       |-- column4
       |-- column3
       |-- column2

Currently I am hardcoding the values like:

df=df.withColumn("prev_val",f.struct(f.col("column4"),f.col("column3"),f.col("column2"))

Is there a way we can dynamically pass the values from the list ?

Upvotes: 0

Views: 1686

Answers (1)

mck
mck

Reputation: 42342

You can use a list comprehension:

import pyspark.sql.functions as f

struct_list = ['column4','column3','column2']

df2 = df.withColumn(
    "prev_val",
    f.struct(*[f.col(c) for c in struct_list])
)

And actually you don't even need f.col. You can just pass the column names directly:

df2 = df.withColumn(
    "prev_val",
    f.struct(*struct_list)
)

Upvotes: 2

Related Questions