Reputation: 13
I tried to run fillna to insert nan in column with special character "."
df = spark.createDataFrame(
[(None, None), ('U1', None), ('U3', 1.0)],
['USER_ID', 'a.b']
)
I tried
df = df.fillna({"`a.b`": float("nan")})
also
df = df.fillna({"a.b": float("nan")})
Both of them doesn't work, who have experience on this?
Upvotes: 1
Views: 2185
Reputation: 43524
It seems that there is a limitation of pyspark.sql.DataFrame.fillna()
which doesn't allow you to specify column names with periods in them when you use the value
parameter as a dictionary.
From the docs:
value – int, long, float, string, bool or dict. Value to replace null values with. If the value is a dict, then subset is ignored and value must be a mapping from column name (string) to replacement value. The replacement value must be an int, long, float, boolean, or string.
You should be able to use fillna
using the other syntax that specifies both the value
and subset
parameters.
df.fillna(value=float("nan"), subset=["a.b"]).show()
#+-------+---+
#|USER_ID|a.b|
#+-------+---+
#| null|NaN|
#| U1|NaN|
#| U3|1.0|
#+-------+---+
The above worked for me in Spark 2.4, but I don't see why it should not work on older version.
If you are still having trouble, another way to do this would be to temporarily rename your columns, call fillna
, and then rename the columns back to the original values:
Here I will rename the columns to replace the "."
with the string "_DOT_"
, which I deliberately picked to avoid conflicting with existing substrings in other column names.
df.toDF(*[c.replace(".", "_DOT_") for c in df.columns])\
.fillna({"a_DOT_b": float("nan")})\
.toDF(*df.columns)\
.show()
#+-------+---+
#|USER_ID|a.b|
#+-------+---+
#| null|NaN|
#| U1|NaN|
#| U3|1.0|
#+-------+---+
Upvotes: 1
Reputation: 13581
This is working.
df = spark.createDataFrame([(None, None), ('U1', None), ('U3', 1.0)], ['USER_ID', 'a.b'])
df = df.fillna(float("nan"), ['`a.b`'])
df.show(10, False)
+-------+---+
|USER_ID|a.b|
+-------+---+
|null |NaN|
|U1 |NaN|
|U3 |1.0|
+-------+---+
Upvotes: 0