Andreas
Andreas

Reputation: 61

Pandas dataframe: count max consecutive values

I Have e DataFrame like this:

RTD  Val   
BA    2    
BA    88    
BA    15
BA    67    
BA    83   
BA    77  
BA    79  
BA    90   
BA    1   
BA    14   

First:

df['count'] = df.Val > 15
print(df) 

I get as a result:

  RTD  Val  count
0  BA    2  False
1  BA   88  True
2  BA   15  False
3  BA   67  True
4  BA   83  True
5  BA   77  True
6  BA   79  True
7  BA   90  True
8  BA    1  False
9  BA   14  False

Now, to count the maximum consecutive occurrences I use:

def rolling_count(val):
    if val == rolling_count.previous:
        rolling_count.count +=1
    else:
        rolling_count.previous = val
        rolling_count.count = 1
    return rolling_count.count
rolling_count.count = 0 #static variable
rolling_count.previous = None #static variable

ddf= df['count'].apply(rolling_count)
print (max(ddf))

I get as result: 5.

My answer is: To count the max occurrences consecutive of False, how i should do?

The correct value is equal to 2.

I am interested to know the maximum of consecutive occurrences other than True, for Val > 15 and conversely

Upvotes: 2

Views: 1995

Answers (2)

lmo
lmo

Reputation: 38500

Here is a longer method that coerces count to be an integer rather then boolean by adding 0. The absolute difference indicates changes in the boolean value, and the first value is filled to be 1.

The result of this change Series is evaluated as to whether elements are greater than 0 in the 'bools' variable and the corresponding elements from df['count'] are extracted.

The results of the change vector are used with cumsum to form IDs which are used in groupby in the runs variable. Counts of each ID are then contstructed in the runs variable.

countDf = DataFrame({'bools': list(df['count'][(df['count'] + 0)
                       .diff().abs().fillna(1) > 0]),
                     'runs': list(df['Val'].groupby((df['count'] + 0)
                       .diff().abs().fillna(1).cumsum()).count())})

countDf

   bools  runs
0  False     1
1   True     1
2  False     1
3   True     5
4  False     2

You can extract the maximum runs using standard subsetting like

countDf[countDf.bools == False]['runs'].max()
2

countDf[countDf.bools == True]['runs'].max()
5

Upvotes: 2

piRSquared
piRSquared

Reputation: 294218

This is my attempt

gt15 = df.Val.gt(15)
counts = df.groupby([gt15, (gt15 != gt15.shift()) \
           .cumsum()]).size().rename_axis(['>15', 'grp'])
counts

>15    grp
False  1      1
       3      1
       5      2
True   2      1
       4      5
dtype: int64

counts.loc[False].max()

2

Upvotes: 1

Related Questions