chuky pedro
chuky pedro

Reputation: 745

Pandas/Python: How to create new column based on values from other columns and apply extra condition to this new column

I have a pandas dataframe and I want to create a new column BB based on the below condition.

  1. Create a new column BB, if the values in column TGR1 is 0, assign 0 to BB else,
  2. The value in TGR1 is not 0, look up the columns ('1','2','3') that corresponds with the value in TGR1 assign the value in that column(either '1','2','3') to the new column BB.

I was able to achieve the first step using

df.loc[df['TGR1'] == 0, 'BB'] = 0

I also tried to use np.where to come up with but I can figure out the right way to go about this.

df['BB'] = np.where(df.TGR1 == 0,0, df.columns == test.TGR1.value )
    
    

Dist    Track    EVENT_ID      Date       1      2        3   TGR1 TGR2
                            
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   1   0
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   2   1
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   0   2
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   3   1
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   2   2
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   1   2

Expected Output:

Dist    Track    EVENT_ID      Date       1      2        3   TGR1 TGR2    BB     
                            
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   1   0     34.00        
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   2   1     5.18     
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   0   2       0
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   3   1     19.10     

Upvotes: 1

Views: 372

Answers (5)

rpanai
rpanai

Reputation: 13447

Here you can try to play some numpy trick as in this answer.

We first define a matrix with values from columns 1,2 and 3 and add a first column with zeros.

import pandas as pd
import numpy as np

# we first define a matrix 
# with len(df) rows and 4 columns
mat = np.zeros((len(df), 4))

# Then we fill the last 3 columns 
# with values from df
mat[:,1:] = df[["1", "2", "3"]].values

# Then a vector with values from df["TGR1"]
v = df["TGR1"].values


# Finally we take the given index
# from each row on matrix
df["BB"] = np.take_along_axis(mat, v[:,None], axis=1)

Timing

I compared the timing for some of the answers here. I just took a df 10_000 larger than the original one

df_bk = pd.concat([df for i in range(10_000)], ignore_index=True)

and before run each test I do df = df_bk.copy()

@wwnde's solution

CPU times: user 430 ms, sys: 12.1 ms, total: 442 ms
Wall time: 452 ms

@cookesd's solution

CPU times: user 746 ms, sys: 0 ns, total: 746 ms
Wall time: 746 ms

@rpanai's solution

CPU times: user 5.54 ms, sys: 0 ns, total: 5.54 ms
Wall time: 4.84 ms

@Psidom's solution

CPU times: user 5.93 ms, sys: 141 µs, total: 6.07 ms
Wall time: 5.61 ms

Psidom's solution and mine have basically the same timing. Here is a plot enter image description here

Upvotes: 4

wwnde
wwnde

Reputation: 26676

Drop TGR2temporarily, do alook up of columns using TGR1 and that should do. code below

s = df.astype(str).drop('TGR2',1).filter(regex='\d', axis=1).reset_index()#Drop TRG2 and filter rows with digits to allow lookup
i = s.astype(str).columns.get_indexer(s.TGR1)#DO alook up to get columns whose values are in TGR1
df['BB'] = s.values[s.index,i]
   Dist Track   EVENT_ID        Date     1     2     3 TGR1 TGR2    BB
0  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    1    0  34.0
1  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    2    1  5.18
2  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    0    2     0
3  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    3    1  19.1
4  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    2    2  5.18
5  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1    1    2  34.0

Upvotes: 0

Jorge L.
Jorge L.

Reputation: 116

this is done easy with the use of boolean mask as you did it in your step one:

df['BB'][df['TGR1'] == 0] = 0

for the other values greaters than 0:

df['BB'][df['TGR1'] == 1] = df['1'][df['TGR1'] == 1]
df['BB'][df['TGR1'] == 2] = df['2'][df['TGR1'] == 2]
df['BB'][df['TGR1'] == 3] = df['3'][df['TGR1'] == 3]

output:
    1         2       3   TGR1   BB
0   34.0    5.18    19.1    1   34.00
1   34.0    5.18    19.1    2   5.18
2   34.0    5.18    19.1    0   0.00
3   34.0    5.18    19.1    3   19.10
4   34.0    5.18    19.1    2   5.18

probably it is pretty much readable.

Upvotes: 1

cookesd
cookesd

Reputation: 1336

You can create the column using a list comprehension with your if-else logic

# Sample data
df = pd.DataFrame({'TGR1':[random.randint(0,3) for i in range(10)],
                   '1':[random.randint(0,100) for i in range(10)],
                   '2':[random.randint(101,200) for i in range(10)],
                   '3':[random.randint(201,300) for i in range(10)]})
# creating the column
df['BB'] = [0 if tgr1_val == 0 else df.loc[ind,str(tgr1_val)]
            for ind,tgr1_val in enumerate(df['TGR1'].values)]

df

#    TGR1   1    2    3   BB
# 0     0  54  107  217    0
# 1     2  71  128  277  128
# 2     1  25  103  269   25
# 3     0  80  112  279    0
# 4     2  98  167  228  167
# 5     3  26  192  285  285
# 6     0  27  107  228    0
# 7     2  13  103  298  103
# 8     3  28  196  289  289
# 9     2  72  186  251  186

Upvotes: 1

akuiper
akuiper

Reputation: 215117

One way is to use numpy advanced indexing:

import numpy as np
# extract columns 1,2,3 into a numpy array with a zeros column stacked on the left
vals = np.column_stack((np.zeros(len(df)), df[list('123')]))

vals
array([[ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ]])

# use TGR1 values as the column index to extract corresponding values
df['BB'] = vals[np.arange(len(df)), df.TGR1.values]

df
   Dist Track   EVENT_ID        Date     1     2     3  TGR1  TGR2     BB
0  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     1     0  34.00
1  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     2     1   5.18
2  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     0     2   0.00
3  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     3     1  19.10
4  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     2     2   5.18
5  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     1     2  34.00

Upvotes: 4

Related Questions