Reputation: 745
I have a pandas dataframe and I want to create a new column BB based on the below condition.
I was able to achieve the first step using
df.loc[df['TGR1'] == 0, 'BB'] = 0
I also tried to use np.where to come up with but I can figure out the right way to go about this.
df['BB'] = np.where(df.TGR1 == 0,0, df.columns == test.TGR1.value )
Dist Track EVENT_ID Date 1 2 3 TGR1 TGR2
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 1 0
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 2 1
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 0 2
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 3 1
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 2 2
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 1 2
Expected Output:
Dist Track EVENT_ID Date 1 2 3 TGR1 TGR2 BB
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 1 0 34.00
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 2 1 5.18
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 0 2 0
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 3 1 19.10
Upvotes: 1
Views: 372
Reputation: 13447
Here you can try to play some numpy trick as in this answer.
We first define a matrix with values from columns 1,2 and 3 and add a first column with zeros.
import pandas as pd
import numpy as np
# we first define a matrix
# with len(df) rows and 4 columns
mat = np.zeros((len(df), 4))
# Then we fill the last 3 columns
# with values from df
mat[:,1:] = df[["1", "2", "3"]].values
# Then a vector with values from df["TGR1"]
v = df["TGR1"].values
# Finally we take the given index
# from each row on matrix
df["BB"] = np.take_along_axis(mat, v[:,None], axis=1)
I compared the timing for some of the answers here. I just took a df
10_000 larger than the original one
df_bk = pd.concat([df for i in range(10_000)], ignore_index=True)
and before run each test I do df = df_bk.copy()
CPU times: user 430 ms, sys: 12.1 ms, total: 442 ms
Wall time: 452 ms
CPU times: user 746 ms, sys: 0 ns, total: 746 ms
Wall time: 746 ms
CPU times: user 5.54 ms, sys: 0 ns, total: 5.54 ms
Wall time: 4.84 ms
CPU times: user 5.93 ms, sys: 141 µs, total: 6.07 ms
Wall time: 5.61 ms
Psidom's solution and mine have basically the same timing. Here is a plot
Upvotes: 4
Reputation: 26676
Drop TGR2temporarily, do alook up of columns using TGR1 and that should do. code below
s = df.astype(str).drop('TGR2',1).filter(regex='\d', axis=1).reset_index()#Drop TRG2 and filter rows with digits to allow lookup
i = s.astype(str).columns.get_indexer(s.TGR1)#DO alook up to get columns whose values are in TGR1
df['BB'] = s.values[s.index,i]
Dist Track EVENT_ID Date 1 2 3 TGR1 TGR2 BB
0 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 1 0 34.0
1 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 2 1 5.18
2 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 0 2 0
3 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 3 1 19.1
4 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 2 2 5.18
5 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 1 2 34.0
Upvotes: 0
Reputation: 116
this is done easy with the use of boolean mask as you did it in your step one:
df['BB'][df['TGR1'] == 0] = 0
for the other values greaters than 0:
df['BB'][df['TGR1'] == 1] = df['1'][df['TGR1'] == 1]
df['BB'][df['TGR1'] == 2] = df['2'][df['TGR1'] == 2]
df['BB'][df['TGR1'] == 3] = df['3'][df['TGR1'] == 3]
output:
1 2 3 TGR1 BB
0 34.0 5.18 19.1 1 34.00
1 34.0 5.18 19.1 2 5.18
2 34.0 5.18 19.1 0 0.00
3 34.0 5.18 19.1 3 19.10
4 34.0 5.18 19.1 2 5.18
probably it is pretty much readable.
Upvotes: 1
Reputation: 1336
You can create the column using a list comprehension with your if-else logic
# Sample data
df = pd.DataFrame({'TGR1':[random.randint(0,3) for i in range(10)],
'1':[random.randint(0,100) for i in range(10)],
'2':[random.randint(101,200) for i in range(10)],
'3':[random.randint(201,300) for i in range(10)]})
# creating the column
df['BB'] = [0 if tgr1_val == 0 else df.loc[ind,str(tgr1_val)]
for ind,tgr1_val in enumerate(df['TGR1'].values)]
df
# TGR1 1 2 3 BB
# 0 0 54 107 217 0
# 1 2 71 128 277 128
# 2 1 25 103 269 25
# 3 0 80 112 279 0
# 4 2 98 167 228 167
# 5 3 26 192 285 285
# 6 0 27 107 228 0
# 7 2 13 103 298 103
# 8 3 28 196 289 289
# 9 2 72 186 251 186
Upvotes: 1
Reputation: 215117
One way is to use numpy advanced indexing:
import numpy as np
# extract columns 1,2,3 into a numpy array with a zeros column stacked on the left
vals = np.column_stack((np.zeros(len(df)), df[list('123')]))
vals
array([[ 0. , 34. , 5.18, 19.1 ],
[ 0. , 34. , 5.18, 19.1 ],
[ 0. , 34. , 5.18, 19.1 ],
[ 0. , 34. , 5.18, 19.1 ],
[ 0. , 34. , 5.18, 19.1 ],
[ 0. , 34. , 5.18, 19.1 ]])
# use TGR1 values as the column index to extract corresponding values
df['BB'] = vals[np.arange(len(df)), df.TGR1.values]
df
Dist Track EVENT_ID Date 1 2 3 TGR1 TGR2 BB
0 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 1 0 34.00
1 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 2 1 5.18
2 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 0 2 0.00
3 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 3 1 19.10
4 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 2 2 5.18
5 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 1 2 34.00
Upvotes: 4