Prathamesh Mohite
Prathamesh Mohite

Reputation: 31

Pattern Identification in a given dataset

I have a DataFrame in python, where the numbers 1 and 0 appear in different columns and in every row. I want to create an additional column that counts the number of times '1' and '0' appear consecutively in that very row. For example, lets say I have a dataset that looks like this:

IDs     q1    q2    q3    q4    q5    q6    q7    q8

A       0     1     1      1     0     0     1     1

B       1     0     1      1     1     1     0     1

C       1     0     1      0     1     0     0     1

I want the output column to look like this

IDs     q1    q2    q3    q4    q5    q6    q7    q8             output

 A      0     1     1      1     0     0     1     1               1

 B      1     0     1      1     1     1     0     1               2

 C      1     0     1      0     1     0     0     1               3

If someone can provider the code for this in Python 3 it would surely be of great help. Thanks in advance.

Upvotes: 0

Views: 71

Answers (1)

Erfan
Erfan

Reputation: 42916

Use eq to check if a value is equal to 0 and shift(-1) to check if the next value is equal to 1. Then we use sum over axis=1:

m = df.eq(1) & df.shift(-1,axis=1).eq(0)
df['Output'] = m.sum(axis=1)

Output

   q1  q2  q3  q4  q5  q6  q7  q8  Output
0   0   1   1   1   0   0   1   1       1
1   1   0   1   1   1   1   0   1       2
2   1   0   1   0   1   0   0   1       3

Or we can check if the difference (diff) is equal to -1 over the row axis:

df['Output'] = df.diff(axis=1).eq(-1).sum(axis=1)

Output

   q1  q2  q3  q4  q5  q6  q7  q8  Output
0   0   1   1   1   0   0   1   1       1
1   1   0   1   1   1   1   0   1       2
2   1   0   1   0   1   0   0   1       3

Upvotes: 1

Related Questions