Reputation: 290
I have a dataframe as below:
def addJump(s):
return s+'-'+s.shift(-1)
newDF = pd.DataFrame({
'Group': ['A','A','B','C','B','B','A','C','C','A','D','D'],
'Value': [1 , 3 , 5 , 10, 3 , 4 , 12, 11, 5 , 6 , 8 , 9 ],
'Extra': [0 , 7 , 0 , 3 , 0 , 0 , 1 , 0 , 2 , 5 , 0 , 0 ]
}, index= [ 0 , 0 , 0 , 1 , 1 , 1 , 2 , 2 , 3 , 3 , 3 , 4])
newDF['Jump'] = newDF.groupby(level=0)['Group'].transform(addJump)
newDF
Group Value Extra Jump
0 A 1 0 A-A
0 A 3 7 A-B
0 B 5 0 NaN
1 C 10 3 C-B
1 B 3 0 B-B
1 B 4 0 NaN
2 A 12 1 A-C
2 C 11 0 NaN
3 C 5 2 C-A
3 A 6 5 A-D
3 D 8 0 NaN
4 D 9 0 NaN
I need to find within the same index
, where the group changes and update Value
of the first occurrence of the next Group
to be the last Value
of the previous Group
. For example, if we look at index 0, Value
in Row 3 Shall be updated to be the Value[row 2] + Extra[row 2]=10
, and the update will be like follows.
Group Value Extra Jump
0 A 1 0 A-A
0 A 3 7 A-B
0 B 10 0 NaN
The final Result shall be as follows:
Group Value Extra Jump
0 A 1 0 A-A
0 A 3 7 A-B
0 B 10 0 NaN
1 C 10 3 C-B
1 B 13 0 B-B
1 B 4 0 NaN
2 A 12 1 A-C
2 C 12 0 NaN
3 C 5 2 C-A
3 A 7 5 A-D
3 D 12 0 NaN
4 D 9 0 NaN
I can use groupby(level=0)
to work on each index group separately as done with addJump Function. However, I will need to again groupby('Group')
and apply the next function where I call the last of the first Group
and the assign it to the first of the second Group
. And this is exactly where I struggle.
Upvotes: 1
Views: 202
Reputation: 29635
Because you need the previous value updated in case it changes too, it is easier to write a for loop. Create a mask with True for the rows you want to change, then do a simple if else in a loop for
# create a mask to get True for rows meeting the criteria to change
mask = (newDF['Group'].ne(newDF['Group'].shift())
& (newDF.index.to_series().shift() == newDF.index))
# initialize values and return list
newVal = newDF['Value'].iloc[0]
l = []
# now loop keep the value from the previous loop and extra coumn shifted
for val, ext_sh, b in zip(newDF['Value'], newDF['Extra'].shift(fill_value=0), mask):
if b: # you change these rows
newVal = newVal + ext_sh
else: # keep original value
newVal = val
l.append(newVal)
# assign the result to a column, can do it in Value directly
newDF['newVal'] = l
print(newDF)
Group Value Extra newVal
0 A 1 0 1
0 A 3 7 3
0 B 5 0 10
1 C 10 3 10
1 B 3 0 13
1 B 4 0 4
2 A 12 1 12
2 C 11 0 13
3 C 5 2 5
3 A 6 5 7
3 D 8 0 12
4 D 9 0 9
EDIT: after looking a bit, you can have a vectotize version of it, using the mask_
with True for the rows to change, then propagate previous values once mask
the mask_
, and add the extra values shifted and cumulative sum for the wanted rows, remove to restart the cumulative sum when not in mask_
. honestly it is harder to maintain that the version with for loop, but probably faster
mask_ = (newDF['Group'].ne(newDF['Group'].shift())
& (newDF.index.to_series().shift() == newDF.index))
s_ = newDF['Extra'].shift().where(mask_).cumsum().ffill().fillna(0)
newDF['newVal2'] = (
newDF['Value'].mask(mask_).ffill()
+ s_
- s_.mask(mask_).ffill()
)
Upvotes: 1