aabyssx
aabyssx

Reputation: 33

How to set pandas.DataFrame cell to null without FutureWarning

I would like to set some cells to null based on a condition. For example:

import pandas as pd # version is 2.2.2
df = pd.DataFrame({'x' : [1, 2, 2, 1, 1, 2]})
df["b"]=False
df.loc[df["x"]==1,"b"]=pd.NA

It works but I get a

FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.

I tried reading the documentation and looking at examples, but could not find a solution. What is the correct way to do this?

Upvotes: 1

Views: 51

Answers (1)

mozway
mozway

Reputation: 260300

By defining b with df['b'] = False, you set the Series/column's dtype to bool, and since pd.NA is not a bool it cannot be inserted safely in the column, which raises the warning (this will be an error in the future).

You could initialize the column as object:

import numpy as np

df['b'] = np.array(False, dtype='object')

df.loc[df['x']==1, 'b'] = pd.NA

Then df['b'].dtype is dtype('O') (object).

Or, better, as nullable boolean:

df['b'] = pd.Series(False, index=df.index, dtype='boolean')

df.loc[df['x']==1, 'b'] = pd.NA

Note that you could also first initialize a nullable boolean column of <NA>s, then assign False where df['x']!=1:

df['b'] = pd.Series(dtype='boolean')

df.loc[df['x']!=1, 'b'] = False

Now df['b'].dtype is BooleanDtype (nullable boolean).

Output:

   x      b
0  1   <NA>
1  2  False
2  2  False
3  1   <NA>
4  1   <NA>
5  2  False

Upvotes: 0

Related Questions