Viv
Viv

Reputation: 1584

How to create a Pyspark UDF for adding new columns to a dataframe

I have some 5 columns to be added to the dataframe. (A - E) The values for these columns are stored in (a - e) variables.

Instead of using

 df.withColumn("A", a).withColumn("B", b).withColumn..... etc 

Can we do this with a udf?

Currently I have named function :

     def add_col(df_name,newCol,value):
         df = df_name
         df = df.withColumn(newCol, value)
         return df

But I am not able to understand how to convert it to UDF and use it. Please help.

Upvotes: 1

Views: 936

Answers (2)

T. Gawęda
T. Gawęda

Reputation: 16076

You should not use UDF, they can't create multiple results.

However you can write select statement similar to this in other answer:

df.select(col("*"), lit(a).as("a"), lit(b).as("b"), ...)

You can also automate this adding:

val fieldsMap = Map("a" -> a, "b" -> b)
 df.select(Array(col("*")) ++ fieldsMap.map(e => lit(e._2).as(e._1)) : _*)

Upvotes: 1

Alper t. Turker
Alper t. Turker

Reputation: 35229

If you want to add multiple columns you can use select with *:

df.select("*", some_column, another_column, ...)

Upvotes: 1

Related Questions