krackoder
krackoder

Reputation: 2981

Replace all values of a column in a dataframe with pyspark

I am looking to replace all the values of a column in a spark dataframe with a particular value. I am using pyspark. I tried something like -

new_df = df.withColumn('column_name',10)

Here I want to replace all the values in the column column_name to 10. In pandas this could be done by df['column_name']=10. I am unable to figure out how to do the same in Spark.

Upvotes: 7

Views: 10837

Answers (2)

architectonic
architectonic

Reputation: 3129

It might be easier to use lit as follows:

from pyspark.sql.functions import lit
new_df = df.withColumn('column_name', lit(10))

Upvotes: 7

Alberto Bonsanto
Alberto Bonsanto

Reputation: 18022

You can use a UDF to replace the value. However you can use currying to bring support to different values.

from pyspark.sql.functions import udf, col

def replacerUDF(value):
    return udf(lambda x: value)

new_df = df.withColumnRenamed("newCol", replacerUDF(10)(col("column_name")))

Upvotes: 2

Related Questions