Tanishq Kumar
Tanishq Kumar

Reputation: 283

Pandas groupby change in value of a column

I have a dataframe called merged that looks like this. The index is placa_encoded.

placa_encoded   codcet          break 
    
561.0           490101113       False
561.0           480481112       False
660.0           400081122       False
660.0           420121123       True
660.0           420141122       False
660.0           420151122       False
660.0           420171122       False
660.0           420171113       True
660.0           420161112       False
660.0           420151112       False
660.0           420121112       False
660.0           420111112       False
...

I'm looking for a dataframe that looks like this, chunked into groups indexed by each placa_encoded and at each point with break=True (aside from the very first group obviously) such that

placa_encoded   codcet          break 

[561.0] 
561.0           490101113       False
561.0           480481112       False

[660.0] 
660.0           400081122       False

660.0           420121123       True
660.0           420141122       False
660.0           420151122       False
660.0           420171122       False

660.0           420171113       True
660.0           420161112       False
660.0           420151112       False
660.0           420121112       False
660.0           420111112       False
...

I've tried something like this so far, inspired by this answer, but it has not worked the way I want and instead grouped into buckets of True and False instead for each placa_encoded.

merged['ne'] = merged['break'].ne(merged['break'].shift()).cumsum()
merged.groupby(['placa_encoded', merged['ne']], sort=False)

Upvotes: 2

Views: 86

Answers (1)

ansev
ansev

Reputation: 30920

You can try (if break is not bool you need .astype(bool)):

for name, group in merged.groupby(['placa_encoded', merged['break'].cumsum()], sort=False):
    print('-'*50)
    print(name)
    print(group)
    
--------------------------------------------------
(561.0, 0)
   placa_encoded     codcet  break
0          561.0  490101113  False
1          561.0  480481112  False
--------------------------------------------------
(660.0, 0)
   placa_encoded     codcet  break
2          660.0  400081122  False
--------------------------------------------------
(660.0, 1)
   placa_encoded     codcet  break
3          660.0  420121123   True
4          660.0  420141122  False
5          660.0  420151122  False
6          660.0  420171122  False
--------------------------------------------------
(660.0, 2)
    placa_encoded     codcet  break
7           660.0  420171113   True
8           660.0  420161112  False
9           660.0  420151112  False
10          660.0  420121112  False
11          660.0  420111112  False

Detail

merged['break'].cumsum()

0     0
1     0
2     0
3     1
4     1
5     1
6     1
7     2
8     2
9     2
10    2
11    2
Name: break, dtype: int64

Upvotes: 2

Related Questions