Pandas - Count consecutive rows greater than zero on the group level

Question

I would like to translate this R code into Python's pandas:

gg <- c('a', 'a', 'b', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'b')
val <- c(-2, 2, -1, 2, 3, 3, -7, 5, 6, 2, 8)
df = data.table(gg=gg, val=val)
df[, gg_count := seq(.N), by = gg]
df[, counter := gg_count - na.locf(ifelse(val <= 0, gg_count, NA), na.rm=F), by=gg]
print(df)

Output:

  gg val gg_count counter???
   a  -2        1       0
   a   2        2       1
   b  -1        1       0
   a   2        3       2
   b   3        2       1
   b   3        3       2
   a  -7        4       0
   a   5        5       1
   a   6        6       2
   a   2        7       3
   b   8        4       3

Basically, I have columns "gg" and "val" and need to get to the column "counter". In R, I used also helper column "gg_count".

"counter" column, as you can see in the example, counts how many consecutive rows have "val" greater than zero. When negative number pops up, we reset the counter and start counting again. Everything needs to be done on a group level using column "gg".

Mykola Zotko · Accepted Answer

You can use:

g = df['val'].le(0).groupby(df['gg']).cumsum() # get groups between negative numbers
df['val'].gt(0).groupby([df['gg'], g]).cumsum()

Output:

0     0
1     1
2     0
3     2
4     1
5     2
6     0
7     1
8     2
9     3
10    3
Name: val, dtype: int64

Pandas - Count consecutive rows greater than zero on the group level

Answers (1)

Related Questions