ShanZhengYang
ShanZhengYang

Reputation: 17631

How to create a pandas DataFrame column based on the existence of values in a subset of columns, by row?

I have a pandas DataFrame as follows:

import pandas as pd
data1 = {"column1": ["A", "B", "C", "D", "E", "F", "G"],
         "column2": [338, 519, 871, 1731, 2693, 2963, 3379],
         "column3": [5, 1, 8, 3, 731, 189, 9], 
         "columnA" : [5, 0, 75, 150, 0, 0, 0], 
         "columnB" : [0, 32, 0, 96, 0, 51, 0], 
         "columnC" : [0, 42, 0, 42, 0, 42, 42]}

df = pd.DataFrame(data1)

df
>>>     column1   column2   column3   columnA   columnB   columnC
0         A       338         5         5         0         0
1         B       519         1         0        32        42
2         C       871         8        75         0         0
3         D      1731         3       150        96        42
4         E      2693       731         0         0         0
5         F      2963       189         0        51        42
6         G      3379         9         0         0        42

The values in columnA, columnB, and columnC are either an integer or zero. I would like to check values in columnA, columnB, and columnC such that if there is a integer in columnC and zeros in columns columnA and columnB.

If there is a value in columnC and zeros in columnA and columnB, I would like 1 to be in new column newcolumn. Otherwise, the values in newcolumn should be 0.

The resulting dataframe should be:

>>>     column1   column2   column3   columnA   columnB   columnC     newcolumn
0         A       338         5         5         0         0          0
1         B       519         1         0        32        42          0
2         C       871         8        75         0         0          0
3         D      1731         3       150        96        42          0
4         E      2693       731         0         0         0          0
5         F      2963       189         0        51        42          0
6         G      3379         9         0         0        42          1
....                 .....                         ...........

I'm aware of how to check values by column (e.g. using df.columnA == 0) and creating a new column is quite straightforward. However, how does one check this "by row"?

Upvotes: 1

Views: 350

Answers (4)

piRSquared
piRSquared

Reputation: 294288

# clever regex... might even make good screen name
# might want to use this instead
# v = df.reindex_axis(['columnA', 'columnB', 'columnC'], 1)).values == 0
v = df.filter(regex='[A-Za-z]$').values == 0
v[:, -1] = ~v[:, -1]  # negate the last column
df.assign(New=v.all(1).astype(np.uint8))

  column1  column2  column3  columnA  columnB  columnC  New
0       A      338        5        5        0        0    0
1       B      519        1        0       32       42    0
2       C      871        8       75        0        0    0
3       D     1731        3      150       96       42    0
4       E     2693      731        0        0        0    0
5       F     2963      189        0       51       42    0
6       G     3379        9        0        0       42    1

Its also pretty fast

time test

enter image description here

Upvotes: 2

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

You can use DataFrame.eval method:

In [146]: df['newcolumn'] = df.eval("columnA == 0 and columnB == 0 and columnC != 0") \
                              .astype(np.uint8)

In [147]: df
Out[147]:
  column1  column2  column3  columnA  columnB  columnC  newcolumn
0       A      338        5        5        0        0          0
1       B      519        1        0       32       42          0
2       C      871        8       75        0        0          0
3       D     1731        3      150       96       42          0
4       E     2693      731        0        0        0          0
5       F     2963      189        0       51       42          0
6       G     3379        9        0        0       42          1

Upvotes: 2

Vaishali
Vaishali

Reputation: 38415

You can use np.where

df['newcolumn'] = np.where((df.columnA ==0) & (df.columnB == 0) & (df.columnC!= 0), 1, 0)


    column1 column2 column3 columnA columnB columnC newcolumn
0   A       338     5       5       0       0       0
1   B       519     1       0       32      42      0
2   C       871     8       75      0       0       0
3   D       1731    3       150     96      42      0
4   E       2693    731     0       0       0       0
5   F       2963    189     0       51      42      0
6   G       3379    9       0       0       42      1

Upvotes: 2

plasmon360
plasmon360

Reputation: 4199

you can use boolean & operator on multiple conditions as shown below

df['new column'] = (df['columnA'] == 0) & (df['columnB'] == 0) & (df['columnC'] != 0)
df['new column'] = df['new column'].astype(int)
df

results in

 column1  column2  column3  columnA  columnB  columnC  new column
0       A      338        5        5        0        0           0
1       B      519        1        0       32       42           0
2       C      871        8       75        0        0           0
3       D     1731        3      150       96       42           0
4       E     2693      731        0        0        0           0
5       F     2963      189        0       51       42           0
6       G     3379        9        0        0       42           1

Upvotes: 2

Related Questions