Pedro de Sá
Pedro de Sá

Reputation: 780

Count consecutive values above threshold for all columns in dataframe

I have the following df:

       A      B      C
0     47     50     22
1     47     53     28
2     47     65     33
3     47     62     33
4     47     62     33
5     47     62     33
6     47     62     33
7     47     62     33

I'd like to create an array with the amount of times that any value repeats itself more than 5 times and sum it. In the above example the output that I'd like it would be:

output = [8 5 6]

Is there a way to do it using pandas methods? I almost got the correct output but using for loops and it takes too long since I have +1000 columns and almost 7000 rows.

Upvotes: 1

Views: 1012

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150825

IIUC, you want to find the consecutive blocks within each column and add their sizes up:

# compare cells with previous 
# and stack into a series
s = df.eq(df.shift()).melt()

# groups of consecutive equal values
groups = (~s['value']).groupby(s['variable']).cumsum()

(s.groupby(['variable', groups])
  .size()
  .loc[lambda x: x>=5]          # stole from BEN's answer
  .groupby(level=0).sum()       # replace this line with sum(level=0) would also work
)

Output:

variable
A    8
B    5
C    6
dtype: int64

Upvotes: 2

BENY
BENY

Reputation: 323396

Is this what you need ?

s=df.melt()
s.value.value_counts().loc[lambda x : x>=5]
47    8
33    6
62    5
Name: value, dtype: int64

Update

s=df.melt()
s.groupby(list(s)).size().loc[lambda x : x>=5]
variable  value
A         47       8
B         62       5
C         33       6
dtype: int64

Upvotes: 2

Related Questions