pceccon
pceccon

Reputation: 9844

Comparing values with groups - pandas

First things first, I have a data frame that has these columns:

issue_date | issue | special  |  group 

Multiple rows can comprise the same group. For each group, I want to get its maximum date:

date_current = history.groupby('group').agg({'issue_date' : [np.min, np.max]})
date_current = date_current.issue_date.amax

After that, I want to filter each group by its max_date-months:

date_before  = date_current.values - pd.Timedelta(weeks=4*n)

I.e., for each group, I want to discard rows where the column issue_date < date_before:

hh = history[history['issue_date'] > date_before]

ValueError: Lengths must match to compare

This last line doesn't work though, since the the lengths don't match. This is expected because I have x lines in my data frame, but date_before has length equals to the number of groups in my data frame.

Given data, I'm wondering how I can perform this subtraction, or filtering, by groups. Do I have to iterate of the data frame somehow?

Upvotes: 2

Views: 3652

Answers (2)

Federico Garza
Federico Garza

Reputation: 157

You can solve this in a similar manner as you attempted it.

I've created my own sample data as follows:

history

  issue_date  group
0 2014-01-02      1
1 2014-01-02      2
2 2016-02-04      3
3 2016-03-05      2

You use group_by and apply to do what you were attempting. First you definge the function you want to apply. Then group_by.apply will apply it to every group. In this case I've used n=1 to demonstrate the point:

def date_compare(df):

    date_current = df.issue_date.max()
    date_before  = date_current - pd.Timedelta(weeks=4*1)
    hh = df[df['issue_date'] > date_before]

    return hh

hh = history.groupby('group').apply(date_compare)

        issue_date  group
group                    
1     0 2014-01-02      1
2     3 2016-03-05      2
3     2 2016-02-04      3

So the smaller date in group 2 has not survived the cut.

Hope that's helpful and that it follows the same logic you were going for.

Upvotes: 2

P.Tillmann
P.Tillmann

Reputation: 2110

I think your best option will be to merge your original df with date_current, but this will only work if you change your calculation of the date_before such that the group information isn't lost:

date_before  = date_current - pd.Timedelta(weeks=4*n)

Then you can merge left on group and right on index(since you grouped on that before)

history = pd.merge(history, date_before.to_frame(), left_on='group', right_index=True)

Then your filter should work. The call of to_frame is nessesary because you can't merge a dataframe and a series.

Hope that helps.

Upvotes: 1

Related Questions