Reputation: 497
I have the following Dataframe as input:
l = [2,2,2,5,5,5,3,3,2,2,4,4,6,5,5,3,5]
df = pd.DataFrame(l)
print(df)
0
0 2
1 2
2 2
3 5
4 5
5 5
6 3
7 3
8 2
9 2
10 4
11 4
12 6
13 5
14 5
15 3
16 5
As an output I would like to have a final count of the total sequences that meet a certain condition. For example, in this case, I want the number of sequences that the values are greater than 3. So, the output is 3.
Is there a way to calculate this without a for-loop in pandas ? I have already implemented a solution using for-loop, and I wonder if there is better approach using pandas in O(N) time.
Thanks very much!
Related to this question: How to count the number of time intervals that meet a boolean condition within a pandas dataframe?
Upvotes: 2
Views: 125
Reputation: 862831
You can use:
m = df[0] > 3
df[1] = (~m).cumsum()
df = df[m]
print (df)
0 1
3 5 3
4 5 3
5 5 3
10 4 7
11 4 7
12 6 7
13 5 7
14 5 7
16 5 8
#create tuples
df = df.groupby(1)[0].apply(tuple).value_counts()
print (df)
(5, 5, 5) 1
(4, 4, 6, 5, 5) 1
(5,) 1
Name: 0, dtype: int64
#alternativly create strings
df = df.astype(str).groupby(1)[0].apply(''.join).value_counts()
print (df)
5 1
44655 1
555 1
Name: 0, dtype: int64
If need output as list:
print (df.astype(str).groupby(1)[0].apply(''.join).tolist())
['555', '44655', '5']
Detail:
print (df.astype(str).groupby(1)[0].apply(''.join))
3 555
7 44655
8 5
Name: 0, dtype: object
Upvotes: 2
Reputation: 27879
If you don't need pandas
this will suit your needs:
l = [2,2,2,5,5,5,3,3,2,2,4,4,6,5,5,3,5]
def consecutive(array, value):
result = []
sub = []
for item in array:
if item > value:
sub.append(item)
else:
if sub:
result.append(sub)
sub = []
if sub:
result.append(sub)
return result
print(consecutive(l,3))
#[[5, 5, 5], [4, 4, 6, 5, 5], [5]]
Upvotes: 0