prasannads
prasannads

Reputation: 649

Pyspark Dataframe select all columns with alias on few columns

I have a dataframe which has a lot of columns (more than 50 columns) and want to select all the columns as they are with few column names renamed by maintaining the below order. I tried the following ,

cols = list(set(df.columns) - {'id','starttime','endtime'})
df.select(col("id").alias("eventid"),col("starttime").alias("eventstarttime"),col("endtime").alias("eventendtime"),*cols,lit(proceessing_time).alias("processingtime"))

and got the error , SyntaxError: only named arguments may follow *expression

Also, instead of *cols, i tried to pass a list of column type

df.select(col("id").alias("eventid"),col("starttime").alias("eventstarttime"),col("endtime").alias("eventendtime"),([col(x) for x in cols]),lit(proceessing_time).alias("processingtime"))

which gives the following error,

`TypeError: 'Column' object is not callable`

Any help is highly appreciated.

Upvotes: 6

Views: 17446

Answers (1)

Suresh
Suresh

Reputation: 5870

We could append the columns together and select from df,

df.select([col("id").alias("eventid"),col("starttime").alias("eventstarttime"),col("endtime").alias("eventendtime")]+cols+[lit(proceessing_time).alias("processingtime")])

Upvotes: 7

Related Questions