Reputation: 4991
Assuming that I have the following data
+--------------------+-----+--------------------+
| values|count| values2|
+--------------------+-----+--------------------+
| aaaaaa| 249| null|
| bbbbbb| 166| b2|
| cccccc| 1680| something|
+--------------------+-----+--------------------+
So if there is a null value in values2
column how to assign the values1
column to it? So the result should be:
+--------------------+-----+--------------------+
| values|count| values2|
+--------------------+-----+--------------------+
| aaaaaa| 249| aaaaaa|
| bbbbbb| 166| b2|
| cccccc| 1680| something|
+--------------------+-----+--------------------+
I thought of something of the following but it doesnt work:
df.na.fill({"values2":df['values']}).show()
I found this way to solve it but there should be something more clear forward:
def change_null_values(a,b):
if b:
return b
else:
return a
udf_change_null = udf(change_null_values,StringType())
df.withColumn("values2",udf_change_null("values","values2")).show()
Upvotes: 3
Views: 26741
Reputation: 1359
Following up on @shadow_dev's method :
df.withColumn("values2",
when(col("values2").isNull(), col("values1"))
.otherwise(col("values2")))
Dmytro Popovych's solution is still the cleanest.
If one needs more fancy when/otherwise logic :
df.withColumn("values2", when(col("values2").isNull() | col("values3").isNull(), col("values1"))
.when(col("values1") == col("values2"), 1)
.otherwise(0))
Upvotes: 4
Reputation: 130
You can use the column attribute .isNull()
.
df.where(col("dt_mvmt").isNull())
df.where(col("dt_mvmt").isNotNull())
This answer comes from this answer - I just don't have enough reputation to add a comment.
Upvotes: -3
Reputation: 954
You can use https://spark.apache.org/docs/1.6.2/api/python/pyspark.sql.html#pyspark.sql.functions.coalesce
df.withColumn('values2', coalesce(df.values2, df.values)).show()
Upvotes: 5