Ali Sultan
Ali Sultan

Reputation: 290

Pandas groupby and assign last of first group to the first of second group

I have a dataframe as below:

def addJump(s):
    return s+'-'+s.shift(-1)

newDF = pd.DataFrame({
   'Group': ['A','A','B','C','B','B','A','C','C','A','D','D'],
   'Value': [1  , 3 , 5 , 10, 3 , 4 , 12, 11, 5 , 6 , 8 , 9 ],
   'Extra': [0  , 7 , 0 , 3 , 0 , 0 , 1 , 0 , 2 , 5 , 0 , 0 ]
}, index=   [ 0 , 0 , 0 , 1 , 1 , 1 , 2 , 2 , 3 , 3 , 3 , 4])
newDF['Jump'] = newDF.groupby(level=0)['Group'].transform(addJump)
newDF

  Group  Value  Extra  Jump
0   A      1    0      A-A
0   A      3    7      A-B
0   B      5    0      NaN
1   C     10    3      C-B
1   B      3    0      B-B
1   B      4    0      NaN
2   A     12    1      A-C
2   C     11    0      NaN
3   C      5    2      C-A
3   A      6    5      A-D
3   D      8    0      NaN
4   D      9    0      NaN

I need to find within the same index, where the group changes and update Value of the first occurrence of the next Group to be the last Value of the previous Group. For example, if we look at index 0, Value in Row 3 Shall be updated to be the Value[row 2] + Extra[row 2]=10 , and the update will be like follows.

  Group  Value  Extra  Jump
0   A      1    0      A-A
0   A      3    7      A-B
0   B      10   0      NaN

The final Result shall be as follows:

  Group  Value  Extra  Jump
0   A      1    0      A-A
0   A      3    7      A-B
0   B     10    0      NaN
1   C     10    3      C-B
1   B     13    0      B-B
1   B      4    0      NaN
2   A     12    1      A-C
2   C     12    0      NaN
3   C      5    2      C-A
3   A      7    5      A-D
3   D      12   0      NaN
4   D      9    0      NaN

I can use groupby(level=0) to work on each index group separately as done with addJump Function. However, I will need to again groupby('Group') and apply the next function where I call the last of the first Group and the assign it to the first of the second Group. And this is exactly where I struggle.

Upvotes: 1

Views: 202

Answers (1)

Ben.T
Ben.T

Reputation: 29635

Because you need the previous value updated in case it changes too, it is easier to write a for loop. Create a mask with True for the rows you want to change, then do a simple if else in a loop for

# create a mask to get True for rows meeting the criteria to change
mask = (newDF['Group'].ne(newDF['Group'].shift()) 
        & (newDF.index.to_series().shift() == newDF.index))

# initialize values and return list
newVal = newDF['Value'].iloc[0]
l = []

# now loop keep the value from the previous loop and extra coumn shifted
for val, ext_sh, b in zip(newDF['Value'], newDF['Extra'].shift(fill_value=0), mask):
    if b: # you change these rows
        newVal = newVal + ext_sh
    else: # keep original value
        newVal = val
    l.append(newVal)
# assign the result to a column, can do it in Value directly
newDF['newVal'] = l

print(newDF)
  Group  Value  Extra  newVal
0     A      1      0       1
0     A      3      7       3
0     B      5      0      10
1     C     10      3      10
1     B      3      0      13
1     B      4      0       4
2     A     12      1      12
2     C     11      0      13
3     C      5      2       5
3     A      6      5       7
3     D      8      0      12
4     D      9      0       9

EDIT: after looking a bit, you can have a vectotize version of it, using the mask_ with True for the rows to change, then propagate previous values once mask the mask_, and add the extra values shifted and cumulative sum for the wanted rows, remove to restart the cumulative sum when not in mask_. honestly it is harder to maintain that the version with for loop, but probably faster

mask_ = (newDF['Group'].ne(newDF['Group'].shift()) 
        & (newDF.index.to_series().shift() == newDF.index))

s_ = newDF['Extra'].shift().where(mask_).cumsum().ffill().fillna(0)

newDF['newVal2'] = (
    newDF['Value'].mask(mask_).ffill()
    + s_ 
    - s_.mask(mask_).ffill()
)

Upvotes: 1

Related Questions