user12809368
user12809368

Reputation:

Converting strings to boolean gives only False value

Trying to convert data (string format from a csv file) to boolean (in a dataframe), I have 'lost' information on their original value, so now all the values are boolean False.

The columns that I am trying to change into boolean are the following:

df['Col1'] =df['Col1'].astype('bool')
df['Col2'] =df['Col2'].astype('bool')

I have also tried with

df.Col1 = np.where(df.Col1.eq('true'), True, False)
df. Col2 = np.where(df.Col2.eq('true') | df.Col2.eq('tbc'), True, False)

The unique values for each column, Col1 and Col2 are:

Col1: array([true, false, nan], dtype=object)

Col2: array(['true', 'false', 'tbc', nan], dtype=object)

My original dataset has the following values.

Col1      Col2
true      true
true      true
false     false
nan       false
false     true
true      tbc

Though they were converted into boolean, all the values are False:

Col1      Col2
False     False
False     False
False     False
False     False
False     False
False     False

I would like to treat TBC as True. Why am I getting only False values? Any idea on how I could fix it?

Sample of original dataset and code:

Date                 Checked             Verified
2018-05-23           FALSE                TRUE
2018-05-24           TRUE                 TBC
2018-05-26           FALSE                TBC
2018-05-31            nan                 nan
2019-12-01           TRUE                 TRUE
2019-12-05           TRUE                 TBC
2019-12-15           TRUE                 FALSE
2019-12-23           FALSE                nan

Code

Read the file csv:

df=pd.read_csv(path, sep=';', engine='python')

Transform to lower case

df= df.apply(lambda x: x.astype(str).str.lower())

Transform string to boolean

df['Checked'] = np.where(df['Checked'].eq('true'), True, False)
df['Verified'] = np.where(df['Verified'].eq('true') | df['Verified'].eq('tbc'), True, False)

Then I test how many rows have value Checked = True:

len(df[df['Checked']=='true']) 

output: 153

Convert to boolean Checked:

df['Checked'] = np.where(df['Checked'].eq('true'), True, False)
len(df[df['Checked']==True])

output: 153

Convert Verified to boolean:

df['Verified'] = np.where(df['Verified'].eq('true') | df['Verified'].eq('tbc'), True, False)

len(df[df['Verified']==True])

output: 0 (expected 60)

Upvotes: 1

Views: 422

Answers (2)

Cerno
Cerno

Reputation: 831

For me, this works perfectly. Since you haven't given the exact csv format, I assumed it would be like this:

Date;Checked;Verified
2018-05-23;FALSE;TRUE
2018-05-24;TRUE;TBC
2018-05-26;FALSE;TBC
2018-05-31;nan;nan
2019-12-01;TRUE;TRUE
2019-12-05;TRUE;TBC
2019-12-15;TRUE;FALSE
2019-12-23;FALSE;nan

Then I called the code, like you did:

df=pd.read_csv(path, sep=';', engine='python')
df=df.apply(lambda x: x.astype(str).str.lower())
df['Checked'] = np.where(df['Checked'].eq('true'), True, False)
df['Verified'] = np.where(df['Verified'].eq('true') | df['Verified'].eq('tbc'), True, False)

The resulting dataframe looks like this:

         Date  Checked  Verified
0  2018-05-23    False      True
1  2018-05-24     True      True
2  2018-05-26    False      True
3  2018-05-31    False     False
4  2019-12-01     True      True
5  2019-12-05     True      True
6  2019-12-15     True     False
7  2019-12-23    False     False

Which python and pandas version do you use? I tested this with Python 3.6.1 and Pandas 1.04

Upvotes: 0

cs95
cs95

Reputation: 402303

You can do this for both columns via regular expressions and df.replace:

df.astype(str).replace({'(?i)True|TBC': True, '(?i)False|nan': False}, regex=True)

    Col1   Col2
0   True   True
1   True   True
2  False  False
3  False  False
4  False   True
5   True   True

The pattern is case insensitive.

Upvotes: 1

Related Questions