Reputation: 1
I am trying to clean a csv file for data analysis. How do I convert TRUE FALSE into 1 and 0?
When I search Google, they suggested df.somecolumn=df.somecolumn.astype(int)
. However this csv file has 100 columns and not every column is true false(some are categorical, some are numerical). How do I do a sweeping code that allows us to convert any column with TRUE FALSE to 1 and 0 without typing 50 lines of df.somecolumn=df.somecolumn.astype(int)
Upvotes: 0
Views: 138
Reputation: 71
I would do it like this:
df.somecolumn = df.somecolumn.apply(lambda x: 1 if x=="TRUE" else 0)
If you want to iterate through all your columns and check wether they have TRUE/FALSE values, you can do this:
for c in df:
if 'TRUE' in df[c] or 'FALSE' in df[c]:
df[c] = df[c].apply(lambda x: 1 if x=='TRUE' else 0)
Note that this approach is case-sensitive and won't work well if in the column the TRUE/FALSE values are mixed with others.
Upvotes: 0
Reputation: 2757
A slightly different approach.
First, dtypes of a dataframe can be returned using df.dtypes
, which gives a pandas series that looks like this,
a int64
b bool
c object
dtype: object
Second, we could replace bool
with int type using replace
,
df.dtypes.replace('bool', 'int8')
, this gives
a int64
b int8
c object
dtype: object
Finally, pandas seires is essentially a dictionary which can be passed to pd.DataFrame.astype
.
We could write it as a oneliner,
df.astype(df.dtypes.replace('bool', 'int8'))
Upvotes: 0
Reputation: 743
you can use:
df.select_dtypes(include='bool')=df.select_dtypes(include='bool').astype(int)
Upvotes: 4