Ivan R
Ivan R

Reputation: 111

Pandas - Count consecutive rows greater than zero on the group level

I would like to translate this R code into Python's pandas:

gg <- c('a', 'a', 'b', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'b')
val <- c(-2, 2, -1, 2, 3, 3, -7, 5, 6, 2, 8)
df = data.table(gg=gg, val=val)
df[, gg_count := seq(.N), by = gg]
df[, counter := gg_count - na.locf(ifelse(val <= 0, gg_count, NA), na.rm=F), by=gg]
print(df)

Output:

  gg val gg_count counter???
   a  -2        1       0
   a   2        2       1
   b  -1        1       0
   a   2        3       2
   b   3        2       1
   b   3        3       2
   a  -7        4       0
   a   5        5       1
   a   6        6       2
   a   2        7       3
   b   8        4       3

Basically, I have columns "gg" and "val" and need to get to the column "counter". In R, I used also helper column "gg_count".

"counter" column, as you can see in the example, counts how many consecutive rows have "val" greater than zero. When negative number pops up, we reset the counter and start counting again. Everything needs to be done on a group level using column "gg".

Upvotes: 0

Views: 319

Answers (1)

Mykola Zotko
Mykola Zotko

Reputation: 17874

You can use:

g = df['val'].le(0).groupby(df['gg']).cumsum() # get groups between negative numbers
df['val'].gt(0).groupby([df['gg'], g]).cumsum()

Output:

0     0
1     1
2     0
3     2
4     1
5     2
6     0
7     1
8     2
9     3
10    3
Name: val, dtype: int64

Upvotes: 1

Related Questions