Reputation: 9844
First things first, I have a data frame that has these columns:
issue_date | issue | special | group
Multiple rows can comprise the same group
. For each group, I want to get its maximum date:
date_current = history.groupby('group').agg({'issue_date' : [np.min, np.max]})
date_current = date_current.issue_date.amax
After that, I want to filter each group by its max_date-months
:
date_before = date_current.values - pd.Timedelta(weeks=4*n)
I.e., for each group, I want to discard rows where the column issue_date < date_before
:
hh = history[history['issue_date'] > date_before]
ValueError: Lengths must match to compare
This last line doesn't work though, since the the lengths don't match. This is expected because I have x lines in my data frame, but date_before has length equals to the number of groups in my data frame.
Given data, I'm wondering how I can perform this subtraction, or filtering, by groups. Do I have to iterate of the data frame somehow?
Upvotes: 2
Views: 3652
Reputation: 157
You can solve this in a similar manner as you attempted it.
I've created my own sample data as follows:
history
issue_date group
0 2014-01-02 1
1 2014-01-02 2
2 2016-02-04 3
3 2016-03-05 2
You use group_by and apply to do what you were attempting. First you definge the function you want to apply. Then group_by.apply will apply it to every group. In this case I've used n=1 to demonstrate the point:
def date_compare(df):
date_current = df.issue_date.max()
date_before = date_current - pd.Timedelta(weeks=4*1)
hh = df[df['issue_date'] > date_before]
return hh
hh = history.groupby('group').apply(date_compare)
issue_date group
group
1 0 2014-01-02 1
2 3 2016-03-05 2
3 2 2016-02-04 3
So the smaller date in group 2 has not survived the cut.
Hope that's helpful and that it follows the same logic you were going for.
Upvotes: 2
Reputation: 2110
I think your best option will be to merge your original df with date_current, but this will only work if you change your calculation of the date_before such that the group information isn't lost:
date_before = date_current - pd.Timedelta(weeks=4*n)
Then you can merge left on group and right on index(since you grouped on that before)
history = pd.merge(history, date_before.to_frame(), left_on='group', right_index=True)
Then your filter should work. The call of to_frame is nessesary because you can't merge a dataframe and a series.
Hope that helps.
Upvotes: 1