Reputation: 780
I have the following df:
A B C
0 47 50 22
1 47 53 28
2 47 65 33
3 47 62 33
4 47 62 33
5 47 62 33
6 47 62 33
7 47 62 33
I'd like to create an array with the amount of times that any value repeats itself more than 5 times and sum it. In the above example the output that I'd like it would be:
output = [8 5 6]
Is there a way to do it using pandas
methods? I almost got the correct output but using for loops and it takes too long since I have +1000 columns and almost 7000 rows.
Upvotes: 1
Views: 1012
Reputation: 150825
IIUC, you want to find the consecutive blocks within each column and add their sizes up:
# compare cells with previous
# and stack into a series
s = df.eq(df.shift()).melt()
# groups of consecutive equal values
groups = (~s['value']).groupby(s['variable']).cumsum()
(s.groupby(['variable', groups])
.size()
.loc[lambda x: x>=5] # stole from BEN's answer
.groupby(level=0).sum() # replace this line with sum(level=0) would also work
)
Output:
variable
A 8
B 5
C 6
dtype: int64
Upvotes: 2
Reputation: 323396
Is this what you need ?
s=df.melt()
s.value.value_counts().loc[lambda x : x>=5]
47 8
33 6
62 5
Name: value, dtype: int64
Update
s=df.melt()
s.groupby(list(s)).size().loc[lambda x : x>=5]
variable value
A 47 8
B 62 5
C 33 6
dtype: int64
Upvotes: 2