Reputation: 187
Here's some data:
test = pd.DataFrame([[np.nan,"cat","mouse", 'tiger'],
["tiger","dog","elephant", "mouse"],
["cat",np.nan,"giraffe", "cat"],
[np.nan,np.nan,"ant", "ant"]], columns=["animal1","animal2","animal3", "animal4"])
I want to convert all the NaNs to 0 and all the responses to 1.
#First I convert all NaNs to 0 and make everything string
test = test.fillna(0)
test = test.astype(str)
Then I create a list of the columns of interest (this doesn't make sense in this example because there are only 2 columns, but in my actual case there are plenty)
op = test.iloc[:,0:2].columns.tolist()
I would've thought I could just do this:
test[op] = [1 if x != '0' else 0 for x in test[op]]
But it doesn't work as it converts everything to 1.
I then tried to do by each column manually and it does work:
test['animal1'] = [1 if x != '0' else 0 for x in test['animal1']]
Anyone know why the latter way works but not the former? And any guidance on how to make it work would be appreciated.
Edit/update: SeaBean provided a solution that works (Thanks!!). I'd still be interested to know why the method I used only worked when doing it one column at a time (manually).
Upvotes: 0
Views: 103
Reputation: 23217
You can use .notna()
and convert to 0/1 by .astype()
, as follows:
test.notna().astype(int)
Result:
animal1 animal2 animal3 animal4
0 0 1 1 1
1 1 1 1 1
2 1 0 1 1
3 0 0 1 1
To explain why your tried method only worked when doing it one column at a time but not for multiple columns:
When you work on one column at a time, and you specify e.g. test['animal1']
in the list comprehension, you are iterating over the elements of a Pandas Series of the related column. This will perform task as you expect.
However, when you do it in multiple columns by including test[op]
in the list comprehension, here instead of a Pandas Series, test[op]
is a dataframe. When you iterate over this dataframe, you are only getting the column labels of the dataframe. You will understand when you try the following list comprehension:
[x for x in test[op]]
which gives:
['animal1', 'animal2']
Therefore, in your list comprehension for multiple columns, your comparison of x != '0'
will always return true and give all 1's since the column labels you are comparing to do not contain '0'.
Upvotes: 3
Reputation: 9207
You can use .isna() and invert the results:
print(~test.isna())
animal1 animal2 animal3 animal4
0 False True True True
1 True True True True
2 True False True True
3 False False True True
If you rather have 0 and 1 multiply it by 1:
print((~test.isna())*1)
animal1 animal2 animal3 animal4
0 0 1 1 1
1 1 1 1 1
2 1 0 1 1
3 0 0 1 1
Upvotes: 3