Jameson
Jameson

Reputation: 187

Converting all non NaNs values into 1 using list comprehension across multiple columns

Here's some data:

test = pd.DataFrame([[np.nan,"cat","mouse", 'tiger'],
    ["tiger","dog","elephant", "mouse"],
    ["cat",np.nan,"giraffe", "cat"],
    [np.nan,np.nan,"ant", "ant"]],  columns=["animal1","animal2","animal3", "animal4"])

I want to convert all the NaNs to 0 and all the responses to 1.

#First I convert all NaNs to 0 and make everything string
test = test.fillna(0)
test = test.astype(str)

Then I create a list of the columns of interest (this doesn't make sense in this example because there are only 2 columns, but in my actual case there are plenty)

op = test.iloc[:,0:2].columns.tolist()

I would've thought I could just do this:

test[op] = [1 if x != '0' else 0 for x in test[op]]

But it doesn't work as it converts everything to 1.

I then tried to do by each column manually and it does work:

test['animal1'] = [1 if x != '0' else 0 for x in test['animal1']]

Anyone know why the latter way works but not the former? And any guidance on how to make it work would be appreciated.

Edit/update: SeaBean provided a solution that works (Thanks!!). I'd still be interested to know why the method I used only worked when doing it one column at a time (manually).

Upvotes: 0

Views: 103

Answers (2)

SeaBean
SeaBean

Reputation: 23217

You can use .notna() and convert to 0/1 by .astype(), as follows:

test.notna().astype(int)

Result:

   animal1  animal2  animal3  animal4
0        0        1        1        1
1        1        1        1        1
2        1        0        1        1
3        0        0        1        1

Edit

To explain why your tried method only worked when doing it one column at a time but not for multiple columns:

When you work on one column at a time, and you specify e.g. test['animal1'] in the list comprehension, you are iterating over the elements of a Pandas Series of the related column. This will perform task as you expect.

However, when you do it in multiple columns by including test[op] in the list comprehension, here instead of a Pandas Series, test[op] is a dataframe. When you iterate over this dataframe, you are only getting the column labels of the dataframe. You will understand when you try the following list comprehension:

[x for x in test[op]]

which gives:

['animal1', 'animal2']

Therefore, in your list comprehension for multiple columns, your comparison of x != '0' will always return true and give all 1's since the column labels you are comparing to do not contain '0'.

Upvotes: 3

Andreas
Andreas

Reputation: 9207

You can use .isna() and invert the results:

print(~test.isna())

   animal1  animal2  animal3  animal4
0    False     True     True     True
1     True     True     True     True
2     True    False     True     True
3    False    False     True     True

If you rather have 0 and 1 multiply it by 1:

print((~test.isna())*1)

   animal1  animal2  animal3  animal4
0        0        1        1        1
1        1        1        1        1
2        1        0        1        1
3        0        0        1        1

Upvotes: 3

Related Questions