Reputation: 7732
Here is my example. I am grouping, ordering and detecting change from one row to another.
import pandas as pd
import datetime
my_df = pd.DataFrame({'col1': ['a', 'a', 'a', 'a', 'b', 'b', 'b'],
'col2': [2, 2, 3, 2, 5, 5, 5],
'col3': [datetime.date(2023, 2, 1),
datetime.date(2023, 3, 1),
datetime.date(2023, 5, 1),
datetime.date(2023, 4, 1),
datetime.date(2023, 3, 1),
datetime.date(2023, 2, 1),
datetime.date(2023, 4, 1)]})
my_df_temp = my_df.sort_values(by=['col3']).groupby('col1')['col2'].apply(
lambda x: x != x.shift(1)
).reset_index(name='col2_change')
Now I would like to bring result back to my_df
i.e. I would like my_df
to have column col2_change
.
Simple assignment will not work my_df['col2_change'] = my_df_temp.col2_change.values
One way I can do it is by ordering my_df
by two columns col1
and col3
and then simply assigning, but it looks a bit laborious. Is there an easier way to do it?
Upvotes: 1
Views: 57
Reputation: 13257
Code
As everyone seems to agree, you don't need apply
. And in my opinion, you don't need transform
either, because pandas series operates comparisons on index anyway.
my_df['col2_change'] = (
my_df.sort_values('col3')
.groupby('col1')['col2'].shift()
.ne(my_df['col2'])
)
Upvotes: 1
Reputation: 11522
Your method is actually quite good, but if you what a more concise method that also elimnates the overhead of resetting, you can do this:
import pandas as pd
import datetime
my_df = pd.DataFrame({'col1': ['a', 'a', 'a', 'a', 'b', 'b', 'b'],
'col2': [2, 2, 3, 2, 5, 5, 5],
'col3': [datetime.date(2023, 2, 1),
datetime.date(2023, 3, 1),
datetime.date(2023, 5, 1),
datetime.date(2023, 4, 1),
datetime.date(2023, 3, 1),
datetime.date(2023, 2, 1),
datetime.date(2023, 4, 1)]})
my_df['col2_change'] = (
my_df.sort_values(by='col3')
.groupby('col1')['col2']
.transform(lambda x: x != x.shift(1))
)
print(my_df)
which gives
col1 col2 col3 col2_change
0 a 2 2023-02-01 True
1 a 2 2023-03-01 False
2 a 3 2023-05-01 True
3 a 2 2023-04-01 False
4 b 5 2023-03-01 False
5 b 5 2023-02-01 True
6 b 5 2023-04-01 False
transform broadcasts group-level operations back to the original dataframewhich is more efficient in many cases.
Upvotes: 2