Md Mahmudul Hasan
Md Mahmudul Hasan

Reputation: 149

How to identify where a particular sequence in a row occurs for the first time

I have a dataframe in pandas, an example of which is provided below:

Person  appear_1   appear_2   appear_3   appear_4   appear_5  appear_6 
  A        1          0          0           1         0          0       
  B        1          1          0           0         1          0 
  C        1          0          1           1         0          0
  D        0          0          1           0         0          1
  E        1          1          1           1         1          1

As you can see 1 and 0 occurs randomly in different columns. It would be helpful, if anyone can suggest me a code in python such that I am able to find the column number where the 1 0 0 pattern occurs for the first time. For example, for member A, the first 1 0 0 pattern occurs at appear_1. so the first occurrence will be 1. Similarly for the member B, the first 1 0 0 pattern occurs at appear_2, so the first occurrence will be at column 2. The resulting table should have a new column named 'first_occurrence'. If there is no such 1 0 0 pattern occurs (like in row E) then the value in first occurrence column will the sum of number of 1 in that row. The resulting table should look something like this:

Person  appear_1   appear_2   appear_3   appear_4   appear_5  appear_6  first_occurrence
  A        1          0          0           1         0          0           1
  B        1          1          0           0         1          0           2
  C        1          0          1           1         0          0           4
  D        0          0          1           0         0          1           3
  E        1          1          1           1         1          1           6

Thank you in advance.

Upvotes: 0

Views: 45

Answers (1)

Andy L.
Andy L.

Reputation: 25259

I try not to reinvent the wheel, so I develop on my answer to previous question. From that answer, you need to use additional idxmax, np.where, and get_indexer

cols = ['appear_1', 'appear_2', 'appear_3', 'appear_4', 'appear_5', 'appear_6']
df1 = df[cols]
m = df1[df1.eq(1)].ffill(1).notna()
df2 = df1[m].bfill(1).eq(0)
m2 = df2 & df2.shift(-1, axis=1, fill_value=True)

df['first_occurrence'] = np.where(m2.any(1), df1.columns.get_indexer(m2.idxmax(1)), 
                                             df1.shape[1])

Out[540]:
  Person  appear_1  appear_2  appear_3  appear_4  appear_5  appear_6  first_occurrence
0      A         1         0         0         1         0         0         1
1      B         1         1         0         0         1         0         2
2      C         1         0         1         1         0         0         4
3      D         0         0         1         0         0         1         3
4      E         1         1         1         1         1         1         6

Upvotes: 2

Related Questions