Count Total number of sequences that meet condition, without for-loop

Question

I have the following Dataframe as input:

l = [2,2,2,5,5,5,3,3,2,2,4,4,6,5,5,3,5]
df = pd.DataFrame(l)
print(df)
    0
0   2
1   2
2   2
3   5
4   5
5   5
6   3
7   3
8   2
9   2
10  4
11  4
12  6
13  5
14  5
15  3
16  5

As an output I would like to have a final count of the total sequences that meet a certain condition. For example, in this case, I want the number of sequences that the values are greater than 3. So, the output is 3.

1st Sequence = [555]
2nd Sequence = [44655]
3rd Sequence = [5]

Is there a way to calculate this without a for-loop in pandas ? I have already implemented a solution using for-loop, and I wonder if there is better approach using pandas in O(N) time.

Thanks very much!

Related to this question: How to count the number of time intervals that meet a boolean condition within a pandas dataframe?

jezrael · Accepted Answer

You can use:

m = df[0] > 3
df[1] = (~m).cumsum()
df = df[m]
print (df)
    0  1
3   5  3
4   5  3
5   5  3
10  4  7
11  4  7
12  6  7
13  5  7
14  5  7
16  5  8


#create tuples
df  = df.groupby(1)[0].apply(tuple).value_counts()
print (df)

(5, 5, 5)          1
(4, 4, 6, 5, 5)    1
(5,)               1
Name: 0, dtype: int64

#alternativly create strings
df  = df.astype(str).groupby(1)[0].apply(''.join).value_counts()
print (df)

5        1
44655    1
555      1
Name: 0, dtype: int64

If need output as list:

print (df.astype(str).groupby(1)[0].apply(''.join).tolist())
['555', '44655', '5']

Detail:

print (df.astype(str).groupby(1)[0].apply(''.join))

3      555
7    44655
8        5
Name: 0, dtype: object

Count Total number of sequences that meet condition, without for-loop

Answers (2)

Related Questions