Tan Wei Han
Tan Wei Han

Reputation: 1

Data Cleaning with Pandas in Python

I am trying to clean a csv file for data analysis. How do I convert TRUE FALSE into 1 and 0?

When I search Google, they suggested df.somecolumn=df.somecolumn.astype(int). However this csv file has 100 columns and not every column is true false(some are categorical, some are numerical). How do I do a sweeping code that allows us to convert any column with TRUE FALSE to 1 and 0 without typing 50 lines of df.somecolumn=df.somecolumn.astype(int)

Upvotes: 0

Views: 138

Answers (3)

user1695639
user1695639

Reputation: 71

I would do it like this:

df.somecolumn = df.somecolumn.apply(lambda x: 1 if x=="TRUE" else 0)

If you want to iterate through all your columns and check wether they have TRUE/FALSE values, you can do this:

for c in df:
    if 'TRUE' in df[c] or 'FALSE' in df[c]:
        df[c] = df[c].apply(lambda x: 1 if x=='TRUE' else 0)

Note that this approach is case-sensitive and won't work well if in the column the TRUE/FALSE values are mixed with others.

Upvotes: 0

Mark Wang
Mark Wang

Reputation: 2757

A slightly different approach. First, dtypes of a dataframe can be returned using df.dtypes, which gives a pandas series that looks like this,

a     int64
b      bool
c    object
dtype: object

Second, we could replace bool with int type using replace,

df.dtypes.replace('bool', 'int8'), this gives

a     int64
b     int8
c    object
dtype: object

Finally, pandas seires is essentially a dictionary which can be passed to pd.DataFrame.astype.

We could write it as a oneliner,

df.astype(df.dtypes.replace('bool', 'int8'))

Upvotes: 0

you can use:

df.select_dtypes(include='bool')=df.select_dtypes(include='bool').astype(int)

Upvotes: 4

Related Questions