Reputation: 527
I have a dataframe with 100's of columns:
root
|-- column1
|-- column2
|-- column3
|-- column4
|-- column5
I have a list of the column names:
struct_list = ['column4','column3','column2']
Expected Schema:
root
|-- column1
|-- column2
|-- column3
|-- column4
|-- column5
|-- prev_val
|-- column4
|-- column3
|-- column2
Currently I am hardcoding the values like:
df=df.withColumn("prev_val",f.struct(f.col("column4"),f.col("column3"),f.col("column2"))
Is there a way we can dynamically pass the values from the list ?
Upvotes: 0
Views: 1686
Reputation: 42342
You can use a list comprehension:
import pyspark.sql.functions as f
struct_list = ['column4','column3','column2']
df2 = df.withColumn(
"prev_val",
f.struct(*[f.col(c) for c in struct_list])
)
And actually you don't even need f.col
. You can just pass the column names directly:
df2 = df.withColumn(
"prev_val",
f.struct(*struct_list)
)
Upvotes: 2