Reputation: 283
I have a dataframe called merged
that looks like this. The index is placa_encoded
.
placa_encoded codcet break
561.0 490101113 False
561.0 480481112 False
660.0 400081122 False
660.0 420121123 True
660.0 420141122 False
660.0 420151122 False
660.0 420171122 False
660.0 420171113 True
660.0 420161112 False
660.0 420151112 False
660.0 420121112 False
660.0 420111112 False
...
I'm looking for a dataframe that looks like this, chunked into groups indexed by each placa_encoded
and at each point with break=True
(aside from the very first group obviously) such that
placa_encoded codcet break
[561.0]
561.0 490101113 False
561.0 480481112 False
[660.0]
660.0 400081122 False
660.0 420121123 True
660.0 420141122 False
660.0 420151122 False
660.0 420171122 False
660.0 420171113 True
660.0 420161112 False
660.0 420151112 False
660.0 420121112 False
660.0 420111112 False
...
I've tried something like this so far, inspired by this answer, but it has not worked the way I want and instead grouped into buckets of True
and False
instead for each placa_encoded
.
merged['ne'] = merged['break'].ne(merged['break'].shift()).cumsum()
merged.groupby(['placa_encoded', merged['ne']], sort=False)
Upvotes: 2
Views: 86
Reputation: 30920
You can try (if break is not bool you need .astype(bool)
):
for name, group in merged.groupby(['placa_encoded', merged['break'].cumsum()], sort=False):
print('-'*50)
print(name)
print(group)
--------------------------------------------------
(561.0, 0)
placa_encoded codcet break
0 561.0 490101113 False
1 561.0 480481112 False
--------------------------------------------------
(660.0, 0)
placa_encoded codcet break
2 660.0 400081122 False
--------------------------------------------------
(660.0, 1)
placa_encoded codcet break
3 660.0 420121123 True
4 660.0 420141122 False
5 660.0 420151122 False
6 660.0 420171122 False
--------------------------------------------------
(660.0, 2)
placa_encoded codcet break
7 660.0 420171113 True
8 660.0 420161112 False
9 660.0 420151112 False
10 660.0 420121112 False
11 660.0 420111112 False
Detail
merged['break'].cumsum()
0 0
1 0
2 0
3 1
4 1
5 1
6 1
7 2
8 2
9 2
10 2
11 2
Name: break, dtype: int64
Upvotes: 2