Reputation: 17631
I have a pandas DataFrame as follows:
import pandas as pd
data1 = {"column1": ["A", "B", "C", "D", "E", "F", "G"],
"column2": [338, 519, 871, 1731, 2693, 2963, 3379],
"column3": [5, 1, 8, 3, 731, 189, 9],
"columnA" : [5, 0, 75, 150, 0, 0, 0],
"columnB" : [0, 32, 0, 96, 0, 51, 0],
"columnC" : [0, 42, 0, 42, 0, 42, 42]}
df = pd.DataFrame(data1)
df
>>> column1 column2 column3 columnA columnB columnC
0 A 338 5 5 0 0
1 B 519 1 0 32 42
2 C 871 8 75 0 0
3 D 1731 3 150 96 42
4 E 2693 731 0 0 0
5 F 2963 189 0 51 42
6 G 3379 9 0 0 42
The values in columnA
, columnB
, and columnC
are either an integer or zero. I would like to check values in columnA
, columnB
, and columnC
such that if there is a integer in columnC
and zeros in columns columnA
and columnB
.
If there is a value in columnC
and zeros in columnA
and columnB
, I would like 1 to be in new column newcolumn
. Otherwise, the values in newcolumn
should be 0.
The resulting dataframe should be:
>>> column1 column2 column3 columnA columnB columnC newcolumn
0 A 338 5 5 0 0 0
1 B 519 1 0 32 42 0
2 C 871 8 75 0 0 0
3 D 1731 3 150 96 42 0
4 E 2693 731 0 0 0 0
5 F 2963 189 0 51 42 0
6 G 3379 9 0 0 42 1
.... ..... ...........
I'm aware of how to check values by column (e.g. using df.columnA == 0
) and creating a new column is quite straightforward. However, how does one check this "by row"?
Upvotes: 1
Views: 350
Reputation: 294288
# clever regex... might even make good screen name
# might want to use this instead
# v = df.reindex_axis(['columnA', 'columnB', 'columnC'], 1)).values == 0
v = df.filter(regex='[A-Za-z]$').values == 0
v[:, -1] = ~v[:, -1] # negate the last column
df.assign(New=v.all(1).astype(np.uint8))
column1 column2 column3 columnA columnB columnC New
0 A 338 5 5 0 0 0
1 B 519 1 0 32 42 0
2 C 871 8 75 0 0 0
3 D 1731 3 150 96 42 0
4 E 2693 731 0 0 0 0
5 F 2963 189 0 51 42 0
6 G 3379 9 0 0 42 1
Its also pretty fast
time test
Upvotes: 2
Reputation: 210842
You can use DataFrame.eval method:
In [146]: df['newcolumn'] = df.eval("columnA == 0 and columnB == 0 and columnC != 0") \
.astype(np.uint8)
In [147]: df
Out[147]:
column1 column2 column3 columnA columnB columnC newcolumn
0 A 338 5 5 0 0 0
1 B 519 1 0 32 42 0
2 C 871 8 75 0 0 0
3 D 1731 3 150 96 42 0
4 E 2693 731 0 0 0 0
5 F 2963 189 0 51 42 0
6 G 3379 9 0 0 42 1
Upvotes: 2
Reputation: 38415
You can use np.where
df['newcolumn'] = np.where((df.columnA ==0) & (df.columnB == 0) & (df.columnC!= 0), 1, 0)
column1 column2 column3 columnA columnB columnC newcolumn
0 A 338 5 5 0 0 0
1 B 519 1 0 32 42 0
2 C 871 8 75 0 0 0
3 D 1731 3 150 96 42 0
4 E 2693 731 0 0 0 0
5 F 2963 189 0 51 42 0
6 G 3379 9 0 0 42 1
Upvotes: 2
Reputation: 4199
you can use boolean &
operator on multiple conditions as shown below
df['new column'] = (df['columnA'] == 0) & (df['columnB'] == 0) & (df['columnC'] != 0)
df['new column'] = df['new column'].astype(int)
df
results in
column1 column2 column3 columnA columnB columnC new column
0 A 338 5 5 0 0 0
1 B 519 1 0 32 42 0
2 C 871 8 75 0 0 0
3 D 1731 3 150 96 42 0
4 E 2693 731 0 0 0 0
5 F 2963 189 0 51 42 0
6 G 3379 9 0 0 42 1
Upvotes: 2