user3042850
user3042850

Reputation: 317

Pandas assign cumulative count for consecutive values in a column

This is my data:

print(n0data)

                          FULL_MPID            DateTime  EquipID  count
Index                                                                  
1      5092761672035390000000000000 2018-11-28 00:36:00     1296      1
2      5092761672035390000000000000 2018-11-28 00:37:00     1634      2
3      5092761672035390000000000000 2018-11-28 13:36:00     1296      3
4      5092761672035390000000000000 2018-11-28 13:38:00     1634      4
5      5092761672035390000000000000 2018-11-29 17:37:00     1290      5
6      5092761672035390000000000000 2018-11-29 17:37:00     1634      6
7      5092761672035390000000000000 2018-11-30 21:23:00     1290      7
8      5092761672035390000000000000 2018-11-30 21:24:00     1634      8
9      5092761672035390000000000000 2018-12-02 09:37:00     1296      9
10     5092761672035390000000000000 2018-12-02 09:39:00     1634     10
11     5092761672035390000000000000 2018-12-02 09:39:00     1634     11
12     5092761672035390000000000000 2018-12-03 11:55:00     1290     12
13     5092761672035390000000000000 2018-12-03 12:02:00     1634     13
14     5092761672035390000000000000 2018-12-06 12:22:00     1290     14
15     5092761672035390000000000000 2018-12-06 12:22:00     1634     15
16     5092761672035390000000000000 2018-12-06 12:22:00     1634     16
17     5092761672035390000000000000 2018-12-06 12:23:00     1634     17
18     5092761672035390000000000000 2018-12-06 12:23:00     1634     18
19     5092761672035390000000000000 2018-12-06 12:23:00     1634     19
20     5092761672035390000000000000 2018-12-06 12:23:00     1634     20
21     5092761672035390000000000000 2018-12-06 12:23:00     1634     21
22     5092761672035390000000000000 2018-12-09 05:51:00     1290     22

So I have a groupBy function that makes the following ecount column with the command:

n0data['ecount'] = 
n0data.groupby(['EquipID','FULL_MPID']).cumcount() + 1

The data is sorted by the time and looks to identify when the changeover of EquipID happens.

Ecount is supposed to be:

ecount right

When the EquipID column values changes from one value to another, ecount should reset. However if EquipID does not change, like during index 15-21 rows, EquipID should continue counting. I thought this was what the groupBy delivered also...

Upvotes: 2

Views: 1093

Answers (1)

cs95
cs95

Reputation: 402263

You can use the shift and cumsum trick before groupby:

v = df.EquipID.ne(df.EquipID.shift())
v.groupby(v.cumsum()).cumcount() + 1

Index
1     1
2     1
3     1
4     1
5     1
6     1
7     1
8     1
9     1
10    1
11    2
12    1
13    1
14    1
15    1
16    2
17    3
18    4
19    5
20    6
21    7
22    1
dtype: int64

Upvotes: 2

Related Questions