Prathamesh Mohite
Prathamesh Mohite

Reputation: 31

Pattern identification and sequence detection

I have a dataset 'df' that looks something like this:

MEMBER  seen_1   seen_2   seen_3   seen_4   seen_5   seen_6
  A       1        0        0         1       0        1
  B       1        1        0         0       1        0
  C       1        1        1         0       0        1
  D       0        0        1         0       0        1

As you can see there are several rows of ones and zeros. Can anyone suggest me a code in python such that I am able to count the number of times '1' occurs continuously before the first occurrence of a 1, 0 and 0 in order. For example, for member A, the first double zero event occurs at seen_2 and seen_3, so the event will be 1. Similarly for the member B, the first double zero event occurs at seen_3 and seen_4 so there are two 1s that occur before this. The resultant table should have a new column 'event' something like this:

MEMBER  seen_1   seen_2   seen_3   seen_4   seen_5   seen_6  event 
  A       1        0        0         1       0        1       1
  B       1        1        0         0       1        0       2
  C       1        1        1         0       0        1       3
  D       0        0        1         0       0        1       1

Upvotes: 2

Views: 100

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

My approach:

df = df.set_index('MEMBER')

# count 1 on each rows since the last 0
s = (df.stack()
       .groupby(['MEMBER', df.eq(0).cumsum(1).stack()])
       .cumsum().unstack()
    )

# mask of the zeros:
u = s.eq(0)

# look for the first 1 0 0
idx = (~u & 
        u.shift(-1, axis=1, fill_value=False) & 
        u.shift(-2, axis=1, fill_value=False) ).idxmax(1)

# look up
df['event'] = s.lookup(idx.index, idx)

Test data:

  MEMBER  seen_1  seen_2  seen_3  seen_4  seen_5  seen_6
0      A       1       0       1       0       0       1
1      B       1       1       0       0       1       0
2      C       1       1       1       0       0       1
3      D       0       0       1       0       0       1
4      E       1       0       1       1       0       0

Output:

  MEMBER  seen_1  seen_2  seen_3  seen_4  seen_5  seen_6  event
0      A       1       0       1       0       0       1      1
1      B       1       1       0       0       1       0      2
2      C       1       1       1       0       0       1      3
3      D       0       0       1       0       0       1      1
4      E       1       0       1       1       0       0      2

Upvotes: 2

Related Questions