Reputation: 842
I have a data-frame:
Id Seqno. Event
1 2 A
1 3 B
1 5 A
1 6 A
1 7 A
1 8 B
1 9 C
1 10 D
I want to filter the data-frame according to the time "Event A happened successively". For example if i try Event A > 2 it should return all the Ids as
Id Event count
1 A 3
So far I have tried
df['new'] = df['Event'].shift()+ df['Event']
a= df[df['new']=='AA']
a[a['Id'].isin(a['Id'].value_counts()[a['Id'].value_counts()>2].index)]
But it doesn't seem to work.
Upvotes: 0
Views: 83
Reputation: 5958
A function without using pandas internals, (which is arguably a better way to do this):
def eventmagic(event="A", num=2):
subdf = df[(df["Event"] == event) & (df["Seqno."] > num)].sort_values(by="Seqno.")
arr = subdf["Seqno."].values - np.arange(len(subdf)) # 5,6,7 to 5,5,5
if len(arr) == 0: return 0
i = 0
while arr[i] == arr[0]:
i += 1
if i >= len(subdf):
break
return i
>>> eventmagic("B", 2)
1
>>> eventmagic("A", 1)
1
>>> eventmagic("A", 3)
3
>>> eventmagic("A", 10)
0
Upvotes: -1
Reputation: 51155
This problem can be split into two parts. First you want to group by both Id
, and consecutive elements in the Event
series. This can be accomplished using shift
+ cumsum
m = df.Event.ne(df.Event.shift()).cumsum()
df['count'] = df.groupby(['Id', m])['Event'].transform('size')
print(df)
Id Seqno. Event count
0 1 2 A 1
1 1 3 B 1
2 1 5 A 3
3 1 6 A 3
4 1 7 A 3
5 1 8 B 1
6 1 9 C 1
7 1 10 D 1
This gives us a series that identifies sequential runs in our Event
column, but now we want to make lookup simple. We can drop_duplicates
, so that each condition only returns a single run per Id/Event/count
, and then use boolean indexing:
f = df[['Id', 'Event', 'count']].drop_duplicates()
f.loc[f.Event.eq('A') & f['count'].gt(2)]
Id Event count
2 1 A 3
Upvotes: 3