Scooter8275
Scooter8275

Reputation: 33

Need to count repeating, consecutive values in python dataframe within a groupby

df = pd.DataFrame({'site':[1,1,1,1,1,1,1,1,1,1], 'parm':[8,8,8,8,8,9,9,9,9,9],
                   'date':[1,2,3,4,5,1,2,3,4,5], 'obs':[1,1,2,3,3,3,5,5,6,6]})

Output

   site  parm  date  obs
0     1     8     1    1
1     1     8     2    1
2     1     8     3    2
3     1     8     4    3
4     1     8     5    3
5     1     9     1    3
6     1     9     2    5
7     1     9     3    5
8     1     9     4    6
9     1     9     5    6

I want to count repeating, sequential "obs" values within a "site" and "parm". I have this code which is close:

df['consecutive'] = df.parm.groupby((df.obs != df.obs.shift()).cumsum()).transform('size')

Output

   site  parm  date  obs  consecutive
0     1     8     1    1            2
1     1     8     2    1            2
2     1     8     3    2            1
3     1     8     4    3            3
4     1     8     5    3            3
5     1     9     1    3            3
6     1     9     2    5            2
7     1     9     3    5            2
8     1     9     4    6            2
9     1     9     5    6            2

It creates the new column with the count. The gap is when the parm changes from 8 to 9 it includes the parm 9 in the parm 8 count. The expected output is:

   site  parm  date  obs  consecutive
0     1     8     1    1            2
1     1     8     2    1            2
2     1     8     3    2            1
3     1     8     4    3            2
4     1     8     5    3            2
5     1     9     1    3            1
6     1     9     2    5            2
7     1     9     3    5            2
8     1     9     4    6            2
9     1     9     5    6            2

Upvotes: 3

Views: 143

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

You need to throw site, parm as indicated in the question into groupby:

df['consecutive'] = (df.groupby([df.obs.ne(df.obs.shift()).cumsum(),
                                 'site', 'parm']
                                )
                     ['obs'].transform('size')
                     )

Output:

   site  parm  date  obs  consecutive
0     1     8     1    1            2
1     1     8     2    1            2
2     1     8     3    2            1
3     1     8     4    3            2
4     1     8     5    3            2
5     1     9     1    3            1
6     1     9     2    5            2
7     1     9     3    5            2
8     1     9     4    6            2
9     1     9     5    6            2

Upvotes: 4

Related Questions