Reputation: 680
I have a dataframe like this:
country question year value
1 a1 2017 Y
1 a1 2018 Y
1 a1 2019 N
1 a2 2017 N
1 a2 2018 N
1 a2 2019 Y
2 a1 2017 Y
2 a1 2018 Nan
2 a1 2019 Y
2 a2 2017 Y
2 a2 2018 N
2 a2 2019 Y
3 a1 2017 Y
3 a1 2018 N
3 a1 2019 Y
3 a2 2017 Y
3 a2 2018 Y
3 a2 2019 Y
I want to find where the value from the previous year does not match the value from the next year. I have tried using shift but it doesn't give me what I want. This is how far I've gotten:
country = all_data['country']
question = all_data['question']
value = all_data['value']
for i in range(len(country)):
if(country[i] == country[i+1] && question[i] == question[i+1]):
Desired output:
country question year value match
1 a1 2017 Y
1 a1 2018 Y T
1 a1 2019 N F
1 a2 2017 N
1 a2 2018 N T
1 a2 2019 Y F
2 a1 2017 Y
2 a1 2018 Nan F
2 a1 2019 Y F
2 a2 2017 Y
2 a2 2018 N F
2 a2 2019 Y F
3 a1 2017 Y
3 a1 2018 N F
3 a1 2019 Y F
3 a2 2017 Y
3 a2 2018 Y T
3 a2 2019 Y T
Thank you for any suggestions!!
Upvotes: 1
Views: 1139
Reputation: 71689
Let us group
the dataframe on country
and question
then shift
the column value
one unit downwards, finally compare the shifted value
column with the value
column from original dataframe to create a boolean mask.
s = df.groupby(['country', 'question'])['value'].shift()
df['match'] = df['value'].eq(s).mask(s.isna(), '')
country question year value match
0 1 a1 2017 Y
1 1 a1 2018 Y True
2 1 a1 2019 N False
3 1 a2 2017 N
4 1 a2 2018 N True
5 1 a2 2019 Y False
6 2 a1 2017 Y
7 2 a1 2018 Nan False
8 2 a1 2019 Y False
9 2 a2 2017 Y
10 2 a2 2018 N False
11 2 a2 2019 Y False
12 3 a1 2017 Y
13 3 a1 2018 N False
14 3 a1 2019 Y False
15 3 a2 2017 Y
16 3 a2 2018 Y True
17 3 a2 2019 Y True
Upvotes: 2
Reputation: 323266
Let us do groupby
with shift
then mask
the first item with duplicated
df['new'] = df.groupby(['country','question']).value.shift().\
eq(df.value).mask(~df[['country','question']].duplicated(),'')
Out[435]:
0
1 True
2 False
3
4 True
5 False
6
7 False
8 False
9
10 False
11 False
12
13 False
14 False
15
16 True
17 True
Name: value, dtype: object
Upvotes: 1