mjoy
mjoy

Reputation: 680

Check if next value in row is the same based on conditions python?

I have a dataframe like this:

country  question    year   value
   1      a1          2017     Y
   1      a1          2018     Y
   1      a1          2019     N
   1      a2          2017     N
   1      a2          2018     N
   1      a2          2019     Y
   2      a1          2017     Y
   2      a1          2018     Nan
   2      a1          2019     Y
   2      a2          2017     Y
   2      a2          2018     N
   2      a2          2019     Y
   3      a1          2017     Y
   3      a1          2018     N
   3      a1          2019     Y
   3      a2          2017     Y
   3      a2          2018     Y
   3      a2          2019     Y

I want to find where the value from the previous year does not match the value from the next year. I have tried using shift but it doesn't give me what I want. This is how far I've gotten:

 country = all_data['country']
 question = all_data['question']
 value = all_data['value']

 for i in range(len(country)):
     if(country[i] == country[i+1] && question[i] == question[i+1]):

Desired output:

country  question    year   value      match
   1      a1          2017     Y        
   1      a1          2018     Y         T
   1      a1          2019     N         F
   1      a2          2017     N         
   1      a2          2018     N         T
   1      a2          2019     Y         F
   2      a1          2017     Y           
   2      a1          2018     Nan       F
   2      a1          2019     Y         F
   2      a2          2017     Y      
   2      a2          2018     N         F
   2      a2          2019     Y         F
   3      a1          2017     Y         
   3      a1          2018     N         F
   3      a1          2019     Y         F
   3      a2          2017     Y 
   3      a2          2018     Y         T
   3      a2          2019     Y         T

Thank you for any suggestions!!

Upvotes: 1

Views: 1139

Answers (2)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Let us group the dataframe on country and question then shift the column value one unit downwards, finally compare the shifted value column with the value column from original dataframe to create a boolean mask.

s = df.groupby(['country', 'question'])['value'].shift()
df['match'] = df['value'].eq(s).mask(s.isna(), '')

    country question  year value  match
0         1       a1  2017     Y       
1         1       a1  2018     Y   True
2         1       a1  2019     N  False
3         1       a2  2017     N       
4         1       a2  2018     N   True
5         1       a2  2019     Y  False
6         2       a1  2017     Y       
7         2       a1  2018   Nan  False
8         2       a1  2019     Y  False
9         2       a2  2017     Y       
10        2       a2  2018     N  False
11        2       a2  2019     Y  False
12        3       a1  2017     Y       
13        3       a1  2018     N  False
14        3       a1  2019     Y  False
15        3       a2  2017     Y       
16        3       a2  2018     Y   True
17        3       a2  2019     Y   True

Upvotes: 2

BENY
BENY

Reputation: 323266

Let us do groupby with shift then mask the first item with duplicated

df['new'] = df.groupby(['country','question']).value.shift().\
                   eq(df.value).mask(~df[['country','question']].duplicated(),'')
Out[435]: 
0          
1      True
2     False
3          
4      True
5     False
6          
7     False
8     False
9          
10    False
11    False
12         
13    False
14    False
15         
16     True
17     True
Name: value, dtype: object

Upvotes: 1

Related Questions