Reputation: 1571
I have the following dataframe:
data = {'A': [0,0,0,1,1,1,0,1], 'B': [0,1,1,1,1,1,1,1], 'C': [1,0,1,0,1,1,1,0]}
df=pd.DataFrame(data)
df=df.transpose()
columns={'0':'population'}
df=df.rename(index=str, columns={0: "20062", 1: "20063", 2: "20064", 3: "20071", 4: "20072", 5: "20073", 6: "20074", 7: "20081"})
Out[135]:
20062 20063 20064 20071 20072 20073 20074 20081
A 0 0 0 1 1 1 0 1
B 0 1 1 1 1 1 1 1
C 1 0 1 0 1 1 1 0
My main task is to find the number of ``disappearances"
A 'Disapperance' let us defined to be the case when 0
is followed after 1
So, the expected outcome in this example is A
disappears only once in 20074
, B
disappearance zero times, while C
disappears 3 times (in 20063,20071, and 20081 respectively)
I want to do the following:
Can someone help how I can do this in python.
My dataframe is quite large, so I would ideally look for a general solution.
Thanks
Upvotes: 1
Views: 1200
Reputation: 59274
You can use diff
and sum across axis=None
to get total disappearances
>>> df.diff(axis=1).eq(-1).values.sum(axis=None)
4
To get per row, sum
across axis=1
df.diff(axis=1).eq(-1).sum(axis=1)
A 1
B 0
C 3
dtype: int64
To get per time, sum
across axis=0
df.diff(axis=1).eq(-1).sum(axis=0)
20062 0
20063 1
20064 0
20071 1
20072 0
20073 0
20074 1
20081 1
dtype: int64
Upvotes: 3
Reputation: 323226
First mask
all 0 to NaN
, then we do ffill
for each row , and find the different between this new df between the original one , sum
it
(df.mask(df==0).ffill(1).fillna(0)!=df).sum(1)
Out[146]:
A 1
B 0
C 3
dtype: int64
Upvotes: 1