J Jones
J Jones

Reputation: 3190

Use None instead of np.nan for null values in pandas DataFrame

I have a pandas DataFrame with mixed data types. I would like to replace all null values with None (instead of default np.nan). For some reason, this appears to be nearly impossible.

In reality my DataFrame is read in from a csv, but here is a simple DataFrame with mixed data types to illustrate my problem.

df = pd.DataFrame(index=[0], columns=range(5))
df.iloc[0] = [1, 'two', np.nan, 3, 4] 

I can't do:

>>> df.fillna(None)
ValueError: must specify a fill method or value

nor:

>>> df[df.isnull()] = None
TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

nor:

>>> df.replace(np.nan, None)
TypeError: cannot replace [nan] with method pad on a DataFrame

I used to have a DataFrame with only string values, so I could do:

>>> df[df == ""] = None

which worked. But now that I have mixed datatypes, it's a no go.

For various reasons about my code, it would be helpful to be able to use None as my null value. Is there a way I can set the null values to None? Or do I just have to go back through my other code and make sure I'm using np.isnan or pd.isnull everywhere?

Upvotes: 54

Views: 76310

Answers (3)

apinanyogaratnam
apinanyogaratnam

Reputation: 708

Found this helpful for replacing NaN values with None: df.replace({pd.np.nan: None})

Upvotes: 4

KingOtto
KingOtto

Reputation: 1493

Expanding on the accpeted answer.. When you also need to catch NaN values within numeric dtype columns, you may need to change dtype to object first:

df.astype(object).where(df.notna(), None)

as per original reply by @BENNY

Upvotes: 8

piRSquared
piRSquared

Reputation: 294488

Use pd.DataFrame.where
Uses df value when condition is met, otherwise uses None

df.where(df.notnull(), None)

enter image description here

Upvotes: 99

Related Questions