Reputation: 1939
I have longitudinal data of the following form
import pandas as pd
df = pd.DataFrame({
'a': ['apples', 'plums', 'pears', 'pears', 'pears'],
'b': ['grapes', 'grapes', 'grapes', 'grapes', 'bananas'],
'c': [0, 0, 1, 0, 1]
})
and a function which compares lists (the details of this aren't important)
def compare(old_fruit, new_fruit):
if set(new_fruit) - set(old_fruit) == {'pears'}:
return 1
else:
return 0
c
is 1 when a change occurs in a
and b
that I am interested in. I want to find the rows where c
= 1, grab the values of a
and b
at that point, plus the values of a
and b
from the previous row, compare them using my function and add a new Series to the dataframe showing the result of the comparison.
For the example above, my desired operation would execute compare(['plums', 'grapes'], ['pears', 'grapes'])
and compare(['pears', 'grapes'], ['pears', 'bananas'])
and add the Series [0, 0, 1, 0, 0]
to the dataframe, i.e. the desired output is a dataframe as follows:
pd.DataFrame({
'a': ['apples', 'plums', 'pears', 'pears', 'pears'],
'b': ['grapes', 'grapes', 'grapes', 'grapes', 'bananas'],
'c': [0, 0, 1, 0, 1],
'd': [0, 0, 1, 0, 0]
})
Upvotes: 1
Views: 729
Reputation: 32095
Do exactly what you want to compare in a vectorized way:
df_set = df[['a', 'b']].apply(set, axis=1)
df_set
Out[38]:
0 {grapes, apples}
1 {grapes, plums}
2 {grapes, pears}
3 {grapes, pears}
4 {bananas, pears}
dtype: object
(df_set - df_set.shift()) == {'pears'}
Out[39]:
0 False
1 False
2 True
3 False
4 False
dtype: bool
Upvotes: 1