Avinash Raj
Avinash Raj

Reputation: 174706

Python pandas column to replace string boolean values to actual boolean type

I want to replace string boolean type present inside a column with actual boolean values.

kdf = pd.DataFrame(data={'col1' : [True, 'True', np.nan], 'dt': [datetime.now(), ' 2018-12-12', '2019-12-12'], 'bool': 
                     [False, True, True], 'bnan': [False, True, np.nan]})

so here, I want to convert True(index 1 on col1) to actual boolean type True. What I did was,

kdf.loc[kdf['col1'].str.contains('true', na=False, case=False)] = True
kdf.loc[kdf['col1'].str.contains('false', na=False, case=False)] = False

which converts the column values to actual type but I'm in need of creating a function which accepts only the df column, do an in-line replace and return the modified column (like col.fillna). Note that we are not allowed to pass the whole df into that func. So I can't use df.loc.

Also I'm bit worry about performance, is there anyother way?

Upvotes: 0

Views: 5079

Answers (3)

89f3a1c
89f3a1c

Reputation: 1488

df['col'] = df['col'].apply(lambda x: x.strip().lower() == 'true')

I think the above should work.

Hope this helps!

Upvotes: 1

CypherX
CypherX

Reputation: 7353

Expanding on @89f3a1c's solution and @AvinashRaj's Comment:

We introduce the following data problems in the data.
1. The string 'True' is changed to ' true '. This introduces case-mismatch and leading and trailing spaces.

import pandas as pd
from datetime import datetime

kdf = pd.DataFrame(data={'col1' : [True, ' true  ', np.nan], 
                         'dt': [datetime.now(), ' 2018-12-12', '2019-12-12'], 
                         'bool': [False, True, True], 
                         'bnan': [False, True, np.nan]})

kdf['col1'] = kdf['col1'].apply(lambda x: True if str(x).strip() in ['true','True'] else False)

Dataframe:

    col1    dt  bool    bnan
0   True    2019-09-19 03:22:06.734861  False   False
1   true    2018-12-12 00:00:00.000000  True    True
2   NaN 2019-12-12 00:00:00.000000  True    NaN

Output:

    col1    dt  bool    bnan
0   True    2019-09-19 03:26:47.611914  False   False
1   True    2018-12-12 00:00:00.000000  True    True
2   False   2019-12-12 00:00:00.000000  True    NaN

Upvotes: 0

BENY
BENY

Reputation: 323226

Why not using replace

df.replace({'True':True,'False':False})
# df.replace({'True':True,'False':False}).applymap(type)
Out[123]: 
              bnan            bool             col1             dt
0   <class 'bool'>  <class 'bool'>   <class 'bool'>  <class 'str'>
1   <class 'bool'>  <class 'bool'>   <class 'bool'>  <class 'str'>
2  <class 'float'>  <class 'bool'>  <class 'float'>  <class 'str'>

Update

df.replace({'True':True,'False':False},regex=True).applymap(type)

Sample data notice I added the leading and trailing space

df = pd.DataFrame(data={'col1' : [True, ' True', np.nan], 'dt': [' 2018-12-12', ' 2018-12-12', '2019-12-12'], 'bool': 
                     [False, True, True], 'bnan': ['False  ', True, np.nan]})

Upvotes: 1

Related Questions