Reputation:
I have the following df
col_1 col_2
1 1
1 2
1 3
1 6
1 8
1 11
1 12
1 19
1 24
1 1
1 1
1 2
1 2
1 3
1 3
2 1
2 2
2 4
2 6
2 7
2 11
2 13
2 16
2 19
2 1
2 2
2 3
I would like to do kind of groupby on col_1
and replace the values 1, 2, 3 that occur after 19 in the col_2
and change them with 25, 26, 27.
Expected Output:
col_1 col_2
1 1
1 2
1 3
1 6
1 8
1 11
1 12
1 19
1 24
1 25
1 25
1 26
1 26
1 27
1 27
2 1
2 2
2 4
2 6
2 7
2 11
2 13
2 16
2 19
2 25
2 26
2 27
I would like to know how can this be done using pandas.
Thanks
Edit 1:
My real df
ContextID BacksGas_Flow_sccm StepID
7289973 1.953125 1
7289973 2.05078125 2
7289973 2.05078125 2
7289973 2.05078125 2
7289973 1.953125 2
7289973 1.7578125 2
7289973 1.7578125 2
7289973 1.85546875 2
7289973 1.7578125 2
7289973 9.08203125 5
7289973 46.19140625 5
7289973 46.19140625 5
7289973 46.19140625 5
7289973 46.19140625 5
7289973 46.6796875 5
7289973 46.6796875 5
7289973 46.6796875 5
7289973 46.6796875 5
7289973 46.6796875 5
7289973 46.6796875 5
7289973 46.6796875 5
7289973 46.6796875 5
7289973 46.6796875 5
7289973 46.6796875 5
7289973 46.6796875 7
7289973 46.6796875 7
7289973 46.6796875 7
7289973 46.6796875 12
7289973 46.6796875 12
7289973 46.6796875 12
7289973 46.6796875 12
7289973 46.6796875 12
7289973 46.6796875 12
7289973 46.6796875 12
7289973 46.6796875 15
7289973 46.6796875 15
7289973 46.6796875 16
7289973 46.6796875 16
7289973 46.6796875 17
7289973 25.09765625 19
7289973 45.99609375 19
7289973 59.08203125 19
7289973 61.81640625 19
7289973 62.59765625 19
7289973 63.671875 19
7289973 65.625 19
7289973 66.69921875 19
7289973 67.3828125 19
7289973 67.3828125 19
7289973 67.67578125 19
7289973 68.26171875 19
7289973 69.04296875 19
7289973 69.82421875 19
7289973 69.82421875 19
7289973 70.8984375 19
7289973 70.8984375 19
7289973 70.8984375 19
7289973 70.8984375 19
7289973 71.6796875 19
7289973 71.6796875 19
7289973 72.55859375 19
7289973 72.55859375 19
7289973 72.55859375 19
7289973 72.55859375 19
7289973 72.55859375 19
7289973 72.55859375 19
7289973 73.33984375 19
7289973 73.33984375 19
7289973 73.33984375 19
7289973 74.12109375 19
7289973 74.12109375 19
7289973 74.12109375 19
7289973 73.2421875 19
7289973 73.2421875 19
7289973 74.0234375 19
7289973 74.0234375 19
7289973 74.0234375 19
7289973 74.0234375 19
7289973 74.0234375 19
7289973 74.0234375 19
7289973 74.0234375 19
7289973 74.0234375 19
7289973 74.0234375 19
7289973 74.90234375 19
7289973 74.90234375 19
7289973 74.12109375 19
7289973 74.12109375 19
7289973 74.12109375 19
7289973 74.12109375 19
7289973 74.12109375 19
7289973 75 19
7289973 75 19
7289973 75 19
7289973 74.21875 19
7289973 74.21875 19
7289973 74.21875 19
7289973 75 19
7289973 75 19
7289973 75 19
7289973 75 19
7289973 74.12109375 19
7289973 74.12109375 19
7289973 74.12109375 19
7289973 74.90234375 19
7289973 6.4453125 24
7289973 3.515625 24
7289973 2.5390625 24
7289973 2.05078125 24
7289973 2.05078125 24
7289973 2.05078125 24
7289973 1.953125 24
7289973 1.953125 24
7289973 1.953125 24
7289973 1.953125 24
7289973 2.05078125 24
7289973 1.85546875 24
7289973 1.85546875 24
7289973 1.85546875 24
7289973 1.85546875 24
7289973 1.85546875 24
7289973 2.05078125 24
7289973 1.953125 24
7289973 1.953125 24
7289973 1.7578125 24
7289973 1.66015625 24
7289973 1.7578125 24
7289973 1.7578125 24
7289973 1.7578125 24
7289973 1.85546875 24
7289973 1.85546875 24
7289973 1.953125 24
7289973 1.953125 24
7289973 1.953125 24
7289973 1.953125 24
7289973 1.953125 24
7289973 1.7578125 24
7289973 1.85546875 24
7289973 1.85546875 24
7289973 1.85546875 24
7289973 1.7578125 24
7289973 1.85546875 24
7289973 1.85546875 24
7289973 1.7578125 24
7289973 1.7578125 1
7289973 1.85546875 1
7289973 1.85546875 1
7289973 1.85546875 2
7289973 1.7578125 2
7289973 1.953125 2
7289973 1.953125 2
7289973 1.85546875 2
7289973 1.85546875 3
7289973 1.85546875 3
7289973 1.85546875 3
7289973 1.953125 3
7289973 1.85546875 3
7289973 1.953125 3
7289973 1.85546875 3
7289973 1.7578125 3
7289973 1.85546875 3
7289973 1.85546875 3
7289973 1.7578125 3
7289973 1.85546875 3
Upvotes: 2
Views: 83
Reputation: 9941
Based on your real DataFrame, you can do the following:
df['StepID'] = df.groupby('ContextID')['StepID'].apply(
lambda x: x + (x < x.shift(1)).cumsum()*24)
print(df.tail(25))
Output:
ContextID BacksGas_Flow_sccm StepID
138 7289973 1.855469 24
139 7289973 1.757812 24
140 7289973 1.855469 24
141 7289973 1.855469 24
142 7289973 1.757812 24
143 7289973 1.757812 25
144 7289973 1.855469 25
145 7289973 1.855469 25
146 7289973 1.855469 26
147 7289973 1.757812 26
148 7289973 1.953125 26
149 7289973 1.953125 26
150 7289973 1.855469 26
151 7289973 1.855469 27
152 7289973 1.855469 27
153 7289973 1.855469 27
154 7289973 1.953125 27
155 7289973 1.855469 27
156 7289973 1.953125 27
157 7289973 1.855469 27
158 7289973 1.757812 27
159 7289973 1.855469 27
160 7289973 1.855469 27
161 7289973 1.757812 27
162 7289973 1.855469 27
P.S. In your sample data there is only one ContextID
, but I'm assuming there might be others as well in the full dataset, so I've added groupby
Update: the following is if you only need to increment values after 24
for each ContextID
once by 24 (I'm saving new values to StepID_new
column to show before-and-after the transformation):
x = [1,2,3,19,19,24,1,1,2,2,3,3,2,3,1]
df = pd.DataFrame({'ContextID': np.repeat([1,2], len(x)),
'StepID': x * 2})
df['StepID_new'] = df['StepID'] + df.groupby('ContextID')['StepID'].transform(
lambda x: ((x==24).cumsum() > 0).shift(1, fill_value=0) * 25)
print(df)
Output:
ContextID StepID StepID_new
0 1 1 1
1 1 2 2
2 1 3 3
3 1 19 19
4 1 19 19
5 1 24 24
6 1 1 26
7 1 1 26
8 1 2 27
9 1 2 27
10 1 3 28
11 1 3 28
12 1 2 27
13 1 3 28
14 1 1 26
15 2 1 1
16 2 2 2
17 2 3 3
18 2 19 19
19 2 19 19
20 2 24 24
21 2 1 26
22 2 1 26
23 2 2 27
24 2 2 27
25 2 3 28
26 2 3 28
27 2 2 27
28 2 3 28
29 2 1 26
Upvotes: 0
Reputation: 294506
map_ = {1: 25, 2: 26, 3: 27}
d = {} # Tracks if 19 has been seen yet
for i, c1, c2 in df.itertuples():
if d.setdefault(c1, False):
df.at[i, 'col_2'] = map_.get(c2, c2)
d[c1] |= c2 == 19
np.logigcal_or
m = df.col_2.eq(19)
m = m.groupby(df.col_1).transform(np.logical_or.accumulate) ^ m
df.assign(col_2=df.col_2 + m * 24)
col_1 col_2
0 1 1
1 1 2
2 1 3
3 1 6
4 1 8
5 1 11
6 1 12
7 1 19
8 1 25
9 1 26
10 1 27
11 2 1
12 2 2
13 2 4
14 2 6
15 2 7
16 2 11
17 2 13
18 2 16
19 2 19
20 2 25
21 2 26
22 2 27
Upvotes: 0
Reputation: 88305
One way would be to create a dictionary to replace
the values in col_2
. In order to replace only those that appear after a 19
, GroupBy
, check equality and take the cumsum
to perform boolean indexation on the dataframe:
map_ = {1:25, 2:26, 3:27}
cs = df.col_2.eq(19).groupby(df.col_1).cumsum()
update = df.loc[cs].col_2.replace(map_)
df.loc[update.index, 'col_2'] = update
col_1 col_2
0 1 1
1 1 2
2 1 3
3 1 6
4 1 8
5 1 11
6 1 12
7 1 19
8 1 25
9 1 26
10 1 27
11 2 1
12 2 2
13 2 4
14 2 6
15 2 7
16 2 11
17 2 13
18 2 16
19 2 19
20 2 25
21 2 26
22 2 27
Upvotes: 2
Reputation: 2032
Try below for loop, it does what is required in your case:
for i in df['col_1'].unique():
ix = np.argwhere((df['col_1'] == i) & (df['col_2'] == 19 ))
df.loc[ix[0][0]+1, 'col_2'] = 25
df.loc[ix[0][0]+2, 'col_2'] = 26
df.loc[ix[0][0]+3, 'col_2'] = 27
Upvotes: 0
Reputation: 150805
My try:
def fill19(x):
# x.shift()==19 marks all 1's after 19's
# rolling(3) marks three numbers after 19's
filters = (x.shift()==19).rolling(3).sum().fillna(0).astype(bool)
x[filters] += 24
return x
df.col2 = df.groupby('col_1').col_2.apply(fill19)
0 1
1 2
2 3
3 6
4 8
5 11
6 12
7 19
8 25
9 26
10 27
11 1
12 2
13 4
14 6
15 7
16 11
17 13
18 16
19 19
20 25
21 26
22 27
Name: col_2, dtype: int64
Upvotes: 0