Reputation: 437
I am trying to convert a column containing True/False and null values in string format to Boolean. But whatever I do I end up with either all True values or False Below is my approach to
consider following dataFrame
df = pd.DataFrame({'w':['True', np.nan, 'False'
'True', np.nan, 'False']})
df['w'].dtypes
Out: dtype('O')
df['w'].unique()
Out: array([True, nan, False], dtype=object)
d = {'nan': np.nan,'False':False, 'True': True}
df['w']=df['w'].map(d)
df['w'].dtypes
Out: dtype('O')
df['w'].unique()
array([nan], dtype=object)
One other approach I used is following this SO post:
d = {'nan': 0,'False':0, 'True': 1 }
df['w']=df['w'].map(d)
df['w']=df['w'].astype('bool')
Now it turns to bool but converts all values to True
df['w'].dtypes
Out: dtype('bool')
df['w'].unique()
Out: array([ True])
What am I doing wrong? I want all null values to be null
Upvotes: 2
Views: 7900
Reputation: 862511
I think not necessary, because your original data contains boolean with nan
s, dtypes is object
because mixed values - boolean with missing values:
df = pd.DataFrame({'w':['True', np.nan, 'False']})
print (df['w'].unique())
['True' nan 'False']
print ([type(x) for x in df['w'].unique()])
[<class 'str'>, <class 'float'>, <class 'str'>]
If also nan
is string then your solution working:
df = pd.DataFrame({'w':['True', 'nan', 'False']})
print ([type(x) for x in df['w'].unique()])
[<class 'str'>, <class 'str'>, <class 'str'>]
d = {'nan': np.nan,'False':False, 'True': True}
df['w'] = df['w'].map(d)
print (df['w'].unique())
[True nan False]
print ([type(x) for x in df['w'].unique()])
[<class 'bool'>, <class 'float'>, <class 'bool'>]
df = pd.DataFrame({'w':[True, np.nan, False]})
print (df['w'].unique())
[True nan False]
print ([type(x) for x in df['w'].unique()])
[<class 'bool'>, <class 'float'>, <class 'bool'>]
If want replace nan
to False
use Series.fillna
:
df['w'] = df['w'].fillna(False)
print (df)
w
0 True
1 False
2 False
print (df['w'].dtypes)
bool
print (df['w'].unique())
[ True False]
Upvotes: 2