Reputation: 1145
I’ve a pd df consists four columns: ID
, t
, x1
and x2
.
import pandas as pd
dat = {'ID': [1,1,1,1,2,2,2,3,3,3,3,4,4,4,5,5,6,6,6],
't': [0,1,2,3,0,1,2,0,1,2,3,0,1,2,0,1,0,1,2],
'x1' : [3.5,3.5,3.5,3.5,2.01,2.01,2.01,3.9,3.9,3.9,3.9,2.2,2.2,2.2,1.8,1.8,2.1,2.1,2.1],
'x2': [4,4,4,4,3,3,3,4,4,4,4,3,3,3,2,2,3,3,3]
}
df = pd.DataFrame(dat, columns = ['ID', 't', 'x1','x2'])
print (df)
I need to create a new column y
and group_by ID
such that
if t!=max(t) then y=1,
if t==max(t) then y = x1-x2+1.
The output would look like:
Please not that I have million of records, so the faster the solution the better.
Upvotes: 0
Views: 221
Reputation: 323316
We can combine transform
max
with np.where
df['y'] = np.where(df.t != df.groupby('ID').t.transform('max'), 1, df.x1-df.x2+1)
df
Out[221]:
ID t x1 x2 y
0 1 0 3.50 4 1.00
1 1 1 3.50 4 1.00
2 1 2 3.50 4 1.00
3 1 3 3.50 4 0.50
4 2 0 2.01 3 1.00
5 2 1 2.01 3 1.00
6 2 2 2.01 3 0.01
7 3 0 3.90 4 1.00
8 3 1 3.90 4 1.00
9 3 2 3.90 4 1.00
10 3 3 3.90 4 0.90
11 4 0 2.20 3 1.00
12 4 1 2.20 3 1.00
13 4 2 2.20 3 0.20
14 5 0 1.80 2 1.00
15 5 1 1.80 2 0.80
16 6 0 2.10 3 1.00
17 6 1 2.10 3 1.00
18 6 2 2.10 3 0.10
Upvotes: 3