meliksahturker
meliksahturker

Reputation: 1504

Counting the number of consecutive values that meets a condition (Pandas Dataframe)

So I created this post regarding my problem 2 days ago and got an answer thankfully.

I have a data made of 20 rows and 2500 columns. Each column is a unique product and rows are time series, results of measurements. Therefore each product is measured 20 times and there are 2500 products.

This time I want to know for how many consecutive rows my measurement result can stay above a specific threshold. AKA: I want to count the number of consecutive values that is above a value, let's say 5.

A = [1, 2, 6, 8, 7, 3, 2, 3, 6, 10, 2, 1, 0, 2] We have these values in bold and according to what I defined above, I should get NumofConsFeature = 3 as the result. (Getting the max if there are more than 1 series that meets the condition)

I thought of filtering using .gt, then getting the indexes and using a loop afterwards in order to detect the consecutive index numbers but couldn't make it work.

In 2nd phase, I'd like to know the index of the first value of my consecutive series. For the above example, that would be 3. But I have no idea of how for this one.

Thanks in advance.

Upvotes: 9

Views: 13201

Answers (5)

Cord Kaldemeyer
Cord Kaldemeyer

Reputation: 6917

Here's how I did it using numpy:

import pandas as pd
import numpy as np


df = pd.DataFrame({"a":[1, 2, 6, 7, 8, 3, 2, 3, 6, 10, 2, 1, 0, 2]})


consecutive_steps = 2
marginal_price = 5

assertions = [(df.loc[:, "a"].shift(-i) < marginal_price) for i in range(consecutive_steps)]
condition = np.all(assertions, axis=0)

consecutive_count = df.loc[condition, :].count()
print(consecutive_count)

which yields 6.

Upvotes: 0

Bart
Bart

Reputation: 310

Here's another answer using only Pandas functions:

A = [1, 2, 6, 8, 7, 3, 2, 3, 6, 10, 2, 1, 0, 2]
a = pd.DataFrame(A, columns = ['foo'])
a['is_large'] = (a.foo > 5)
a['crossing'] = (a.is_large != a.is_large.shift()).cumsum()
a['count'] = a.groupby(['is_large', 'crossing']).cumcount(ascending=False) + 1
a.loc[a.is_large == False, 'count'] = 0

which gives

    foo  is_large  crossing  count
0     1     False         1      0
1     2     False         1      0
2     6      True         2      3
3     8      True         2      2
4     7      True         2      1
5     3     False         3      0
6     2     False         3      0
7     3     False         3      0
8     6      True         4      2
9    10      True         4      1
10    2     False         5      0
11    1     False         5      0
12    0     False         5      0
13    2     False         5      0

From there on you can easily find the maximum and its index.

Upvotes: 6

Mehmet nuri
Mehmet nuri

Reputation: 948

There is simple way to do that.
Lets say your list is like: A = [1, 2, 6, 8, 7, 6, 8, 3, 2, 3, 6, 10,6,7,8, 2, 1, 0, 2]
And you want to find how many consecutive series that has values bigger than 6 and length of 5. For instance, here your answer is 2. There is two series that has values bigger than 6 and length of the series are 5. In python and pandas we do that like below:

 condition = (df.wanted_row > 6) & \
            (df.wanted_row.shift(-1) > 6) & \
            (df.wanted_row.shift(-2) > 6) & \
            (df.wanted_row.shift(-3) > 6) & \
            (df.wanted_row.shift(-4) > 6)

consecutive_count = df[condition].count().head(1)[0]

Upvotes: 3

Divakar
Divakar

Reputation: 221754

Here's one with maxisland_start_len_mask -

# https://stackoverflow.com/a/52718782/ @Divakar
def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
    # a is a boolean array

    pad = np.zeros(a.shape[1],dtype=bool)
    mask = np.vstack((pad, a, pad))

    mask_step = mask[1:] != mask[:-1]
    idx = np.flatnonzero(mask_step.T)
    island_starts = idx[::2]
    island_lens = idx[1::2] - idx[::2]
    n_islands_percol = mask_step.sum(0)//2

    bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
    scale = island_lens.max()+1

    scaled_idx = np.argsort(scale*bins + island_lens)
    grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
    max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]

    max_island_percol_start = max_island_starts%(a.shape[0]+1)

    valid = n_islands_percol!=0
    cut_idx = grp_shift_idx[:-1][valid]
    max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)

    out_len = np.full(a.shape[1], fillna_len, dtype=int)
    out_len[valid] = max_island_percol_len
    out_index = np.where(valid,max_island_percol_start,fillna_index)
    return out_index, out_len

def maxisland_start_len(a, trigger_val, comp_func=np.greater):
    # a is 2D array as the data
    mask = comp_func(a,trigger_val)
    return maxisland_start_len_mask(mask, fillna_index = -1, fillna_len = 0)

Sample run -

In [169]: a
Out[169]: 
array([[ 1,  0,  3],
       [ 2,  7,  3],
       [ 6,  8,  4],
       [ 8,  6,  8],
       [ 7,  1,  6],
       [ 3,  7,  8],
       [ 2,  5,  8],
       [ 3,  3,  0],
       [ 6,  5,  0],
       [10,  3,  8],
       [ 2,  3,  3],
       [ 1,  7,  0],
       [ 0,  0,  4],
       [ 2,  3,  2]])

# Per column results
In [170]: row_index, length = maxisland_start_len(a, 5)

In [172]: row_index
Out[172]: array([2, 1, 3])

In [173]: length
Out[173]: array([3, 3, 4])

Upvotes: 0

andrew_reece
andrew_reece

Reputation: 21284

You can apply diff() on your Series, and then just count the number of consecutive entries where the difference is 1 and the actual value is above your cutoff. The largest count is the maximum number of consecutive values.

First compute diff():

df = pd.DataFrame({"a":[1, 2, 6, 7, 8, 3, 2, 3, 6, 10, 2, 1, 0, 2]})
df['b'] = df.a.diff()

df
     a    b
0    1  NaN
1    2  1.0
2    6  4.0
3    7  1.0
4    8  1.0
5    3 -5.0
6    2 -1.0
7    3  1.0
8    6  3.0
9   10  4.0
10   2 -8.0
11   1 -1.0
12   0 -1.0
13   2  2.0

Now count consecutive sequences:

above = 5
n_consec = 1
max_n_consec = 1

for a, b in df.values[1:]:
    if (a > above) & (b == 1):
        n_consec += 1
    else: # check for new max, then start again from 1
        max_n_consec = max(n_consec, max_n_consec)
        n_consec = 1

max_n_consec
3

Upvotes: 0

Related Questions