john doe
john doe

Reputation: 2253

Problems with pandas and numpy where condition/multiple values?

I have the follwoing pandas dataframe:

A  B
1  3
0  3
1  2
0  1
0  0
1  4
....
0  0

I would like to add a new column at the right side, following the following condition:

If the value in B has 3 or 2 add 1 in the new_col for instance:

(*)
A  B new_col
1  3  1
0  3  1
1  2  1
0  1  0
0  0  0
1  4  0
....
0  0  0

So I tried the following:

df['new_col'] = np.where(df['B'] == 3 & 2,'1','0')

However it did not worked:

A  B new_col
1  3  0
0  3  0
1  2  1
0  1  0
0  0  0
1  4  0
....
0  0  0

Any idea of how to do a multiple contidition statement with pandas and numpy like (*)?.

Upvotes: 3

Views: 8300

Answers (5)

PagMax
PagMax

Reputation: 8568

df=pd.DataFrame({'A':[1,0,1,0,0,1],'B':[3,3,2,1,0,4]})
print df
df['C']=[1 if vals==2 or vals==3 else 0 for vals in df['B'] ]
print df

   A  B
0  1  3
1  0  3
2  1  2
3  0  1
4  0  0
5  1  4
   A  B  C
0  1  3  1
1  0  3  1
2  1  2  1
3  0  1  0
4  0  0  0
5  1  4  0

Upvotes: 1

piRSquared
piRSquared

Reputation: 294218

using numpy

df['new'] = (df.B.values[:, None] == np.array([2, 3])).any(1) * 1

Timing

over given data set

enter image description here

over 60,000 rows

enter image description here

Upvotes: 1

Joe T. Boka
Joe T. Boka

Reputation: 6589

You can use Pandas isin which will return a boolean showing whether the elements you're looking for are contained in column 'B'.

df['new_col'] = df['B'].isin([3, 2])
   A  B new_col
0  1  3    True
1  0  3    True
2  1  2    True
3  0  1   False
4  0  0   False
5  1  4   False

Then, you can use astype to convert the boolean values to 0 and 1, True being 1 and False being 0

df['new_col'] = df['B'].isin([3, 2]).astype(int)

Output:

   A  B  new_col
0  1  3        1
1  0  3        1
2  1  2        1
3  0  1        0
4  0  0        0
5  1  4        0

Upvotes: 3

Israel Unterman
Israel Unterman

Reputation: 13510

df['new_col'] = [1 if x in [2, 3] else 0 for x in df.B]

The operators * + ^ work on booleans as expected, and mixing with integers give the expected result. So you can also do:

df['new_col'] = [(x in [2, 3]) * 1 for x in df.B]

Upvotes: 2

Nehal J Wani
Nehal J Wani

Reputation: 16629

Using numpy:

>>> df['new_col'] = np.where(np.logical_or(df['B'] == 3, df['B'] == 2), '1','0')
>>> df
   A  B new_col
0  1  3       1
1  0  3       1
2  1  2       1
3  0  1       0
4  0  0       0
5  1  4       0

Upvotes: 2

Related Questions