Reputation: 19
I want to get the maximum amount of consecutive 1's and 0's from a pandas dataframe per row
import pandas as pd
d=[[0,0,1,0,1,0],[0,0,0,1,1,0],[1,0,1,1,1,1]]
df = pd.DataFrame(data=d)
df
Out[4]:
0 1 2 3 4 5
0 0 0 1 0 1 0
1 0 0 0 1 1 0
2 1 0 1 1 1 1
Output should look something like this:
Out[5]:
0 1 2 3 4 5 Ones Zeros
0 0 0 1 0 1 0 1 2
1 0 0 0 1 1 0 2 3
2 1 0 1 1 1 1 4 1
Upvotes: 1
Views: 290
Reputation: 19
None of the Solutions did work for me like i wanted, so i kinda finally figured it out myself:
m1 = df.eq(0)
m2 = df.eq(1)
df['Ones'] = m1.cumsum(axis=1)[m2].apply(pd.value_counts, axis=1).max(axis=1)
df['Zeros'] = m2.cumsum(axis=1)[m1].apply(pd.value_counts, axis=1).max(axis=1)
Output
In[16]: df
Out[16]:
0 1 2 3 4 5 Ones Zeros
0 0 0 1 0 1 0 1.0 2.0
1 0 0 0 1 1 0 2.0 3.0
2 1 0 1 1 1 1 4.0 1.0
3 1 0 1 1 1 1 4.0 1.0
4 1 0 1 1 1 1 4.0 1.0
5 1 0 1 1 1 1 4.0 1.0
Thanks for your help!
Upvotes: 0
Reputation: 42916
Making use of boolean masking
with eq
and shift
. We check if the current value is equal to 1
or 0
and next value is equal to 1
or 0
. This way we get arrays with True
& False
so we can sum
them over axis=1
:
m1 = df.eq(0) & df.shift(axis=1).eq(0) # check if current value is 0 and previous value is 0
m2 = df.shift(axis=1).isna() # take into account the first column which doesnt have previous value
m3 = df.eq(1) & df.shift(-1, axis=1).eq(1) # check if current value is 1 and next value is 1
m4 = df.shift(-1, axis=1).isna() # take into account the last column which doesnt have next value
df['Ones'] = (m1 | m2).sum(axis=1)
df['Zeros'] = (m3 | m4).sum(axis=1)
Output
0 1 2 3 4 5 Ones Zeros
0 0 0 1 0 1 0 2 1
1 0 0 0 1 1 0 3 2
2 1 0 1 1 1 1 1 4
Upvotes: 1
Reputation: 1233
With inspiration given by this answer:
from itertools import groupby
def len_iter(items):
return sum(1 for _ in items)
def consecutive_values(data, bin_val):
return max(len_iter(run) for val, run in groupby(data) if val == bin_val)
df["Ones"] = df.apply(consecutive_values, bin_val=1, axis=1)
df["Zeros"] = df.apply(consecutive_values, bin_val=0, axis=1)
This will give you:
0 1 2 3 4 5 Ones Zeros
0 0 0 1 0 1 0 1 2
1 0 0 0 1 1 0 2 3
2 1 0 1 1 1 1 4 1
Upvotes: 1