user1684046
user1684046

Reputation: 1939

Compare row and previous row upon change in pandas dataframe

I have longitudinal data of the following form

import pandas as pd

df = pd.DataFrame({
    'a': ['apples', 'plums', 'pears', 'pears', 'pears'],
    'b': ['grapes', 'grapes', 'grapes', 'grapes', 'bananas'],
    'c': [0, 0, 1, 0, 1]
})

and a function which compares lists (the details of this aren't important)

def compare(old_fruit, new_fruit):
    if set(new_fruit) - set(old_fruit) == {'pears'}:
        return 1
    else:
        return 0

c is 1 when a change occurs in a and b that I am interested in. I want to find the rows where c = 1, grab the values of a and b at that point, plus the values of a and b from the previous row, compare them using my function and add a new Series to the dataframe showing the result of the comparison.

For the example above, my desired operation would execute compare(['plums', 'grapes'], ['pears', 'grapes']) and compare(['pears', 'grapes'], ['pears', 'bananas']) and add the Series [0, 0, 1, 0, 0] to the dataframe, i.e. the desired output is a dataframe as follows:

pd.DataFrame({
    'a': ['apples', 'plums', 'pears', 'pears', 'pears'],
    'b': ['grapes', 'grapes', 'grapes', 'grapes', 'bananas'],
    'c': [0, 0, 1, 0, 1],
    'd': [0, 0, 1, 0, 0]
})

Upvotes: 1

Views: 729

Answers (1)

Zeugma
Zeugma

Reputation: 32095

Do exactly what you want to compare in a vectorized way:

df_set = df[['a', 'b']].apply(set, axis=1)

df_set
Out[38]: 
0    {grapes, apples}
1     {grapes, plums}
2     {grapes, pears}
3     {grapes, pears}
4    {bananas, pears}
dtype: object

(df_set - df_set.shift()) == {'pears'}
Out[39]: 
0    False
1    False
2     True
3    False
4    False
dtype: bool

Upvotes: 1

Related Questions