Reputation: 649
I have a pyspark Dataframe:
Dataframe example:
id | column_1 | column_2 | column_3
--------------------------------------------
1 | ["12"] | ["""] | ["67"]
--------------------------------------------
2 | ["""] | ["78"] | ["90"]
--------------------------------------------
3 | ["""] | ["93"] | ["56"]
--------------------------------------------
4 | ["100"] | ["78"] | ["90"]
--------------------------------------------
I want to convert all the values ["""]
of the columns: column_1, column_2, column_3
to null
. types of these 3 columns is an Array
.
Excpect result:
id | column_1 | column_2 | column_3
--------------------------------------------
1 | ["12"] | null | ["67"]
--------------------------------------------
2 | null | ["78"] | ["90"]
--------------------------------------------
3 | null | ["93"] | ["56"]
--------------------------------------------
4 | ["100"] | ["78"] | ["90"]
--------------------------------------------
I tried this solution bellow:
df = df.withColumn(
"column_1",
F.when((F.size(F.col("column_1")) == ""),
F.lit(None)).otherwise(F.col("column_1"))
).withColumn(
"column_2",
F.when((F.size(F.col("column_2")) == ""),
F.lit(None)).otherwise(F.col("column_2"))
).withColumn(
"column_3",
F.when((F.size(F.col("column_3")) == ""),
F.lit(None)).otherwise(F.col("column_3"))
)
But it convert all to null. How can I test on an empty array that contain an empty String normally, [""] not []. Thank you
Upvotes: 0
Views: 821
Reputation: 15258
you can test with a when
and replace the values:
df.withColumn(
"column_1",
F.when(F.col("column_1") != F.array(F.lit('"')), # or '"""' ?
F.col("column_1")
))
Do that for each of your columns.
Upvotes: 3