Reputation: 174706
I want to replace string boolean type present inside a column with actual boolean values.
kdf = pd.DataFrame(data={'col1' : [True, 'True', np.nan], 'dt': [datetime.now(), ' 2018-12-12', '2019-12-12'], 'bool':
[False, True, True], 'bnan': [False, True, np.nan]})
so here, I want to convert True
(index 1 on col1
) to actual boolean type True
. What I did was,
kdf.loc[kdf['col1'].str.contains('true', na=False, case=False)] = True
kdf.loc[kdf['col1'].str.contains('false', na=False, case=False)] = False
which converts the column values to actual type but I'm in need of creating a function which accepts only the df column, do an in-line replace and return the modified column (like col.fillna
). Note that we are not allowed to pass the whole df into that func. So I can't use df.loc
.
Also I'm bit worry about performance, is there anyother way?
Upvotes: 0
Views: 5079
Reputation: 1488
df['col'] = df['col'].apply(lambda x: x.strip().lower() == 'true')
I think the above should work.
Hope this helps!
Upvotes: 1
Reputation: 7353
Expanding on @89f3a1c's solution and @AvinashRaj's Comment:
We introduce the following data problems in the data.
1. The string 'True'
is changed to ' true '
. This introduces case-mismatch and leading and trailing spaces.
import pandas as pd
from datetime import datetime
kdf = pd.DataFrame(data={'col1' : [True, ' true ', np.nan],
'dt': [datetime.now(), ' 2018-12-12', '2019-12-12'],
'bool': [False, True, True],
'bnan': [False, True, np.nan]})
kdf['col1'] = kdf['col1'].apply(lambda x: True if str(x).strip() in ['true','True'] else False)
Dataframe:
col1 dt bool bnan
0 True 2019-09-19 03:22:06.734861 False False
1 true 2018-12-12 00:00:00.000000 True True
2 NaN 2019-12-12 00:00:00.000000 True NaN
Output:
col1 dt bool bnan
0 True 2019-09-19 03:26:47.611914 False False
1 True 2018-12-12 00:00:00.000000 True True
2 False 2019-12-12 00:00:00.000000 True NaN
Upvotes: 0
Reputation: 323226
Why not using replace
df.replace({'True':True,'False':False})
# df.replace({'True':True,'False':False}).applymap(type)
Out[123]:
bnan bool col1 dt
0 <class 'bool'> <class 'bool'> <class 'bool'> <class 'str'>
1 <class 'bool'> <class 'bool'> <class 'bool'> <class 'str'>
2 <class 'float'> <class 'bool'> <class 'float'> <class 'str'>
Update
df.replace({'True':True,'False':False},regex=True).applymap(type)
Sample data notice I added the leading and trailing space
df = pd.DataFrame(data={'col1' : [True, ' True', np.nan], 'dt': [' 2018-12-12', ' 2018-12-12', '2019-12-12'], 'bool':
[False, True, True], 'bnan': ['False ', True, np.nan]})
Upvotes: 1