Reputation: 649
I have a dataframe which has a lot of columns (more than 50 columns) and want to select all the columns as they are with few column names renamed by maintaining the below order. I tried the following ,
cols = list(set(df.columns) - {'id','starttime','endtime'})
df.select(col("id").alias("eventid"),col("starttime").alias("eventstarttime"),col("endtime").alias("eventendtime"),*cols,lit(proceessing_time).alias("processingtime"))
and got the error ,
SyntaxError: only named arguments may follow *expression
Also, instead of *cols, i tried to pass a list of column type
df.select(col("id").alias("eventid"),col("starttime").alias("eventstarttime"),col("endtime").alias("eventendtime"),([col(x) for x in cols]),lit(proceessing_time).alias("processingtime"))
which gives the following error,
`TypeError: 'Column' object is not callable`
Any help is highly appreciated.
Upvotes: 6
Views: 17446
Reputation: 5870
We could append the columns together and select from df,
df.select([col("id").alias("eventid"),col("starttime").alias("eventstarttime"),col("endtime").alias("eventendtime")]+cols+[lit(proceessing_time).alias("processingtime")])
Upvotes: 7