Muriel
Muriel

Reputation: 591

Cannot change Pandas Groupby Object

I'm trying to resample a group in a Pandas object. The resampling works, but somehow the object isn't modified... Do I need to create a new group or something?

This is my code:

grouped_by_product_comp = competitor_df.sort_values(['history_date']).groupby(['item_id'])
for name, group in grouped_by_product_comp:
    my_prod = name
    group = group.drop_duplicates(subset = 'history_date')
    group.set_index('history_date', inplace = True)
    group = group.asfreq('D',method='pad')
    print(group.head())
    break

my_group = grouped_by_product_comp.get_group(394846296)
print(my_group.head()) 

And this is my output:

              id    item_id  competitor_id  competitor_price
history_date                                                  
2016-01-25    3504  394846296        2301745              1205
2016-01-26    3504  394846296        2301745              1205
2016-01-27    3504  394846296        2301745              1205
2016-01-28    3504  394846296        2301745              1205
2016-01-29    3504  394846296        2301745              1205

           id history_date    item_id  competitor_id  competitor_price
187116   3504   2016-01-25  394846296        2301745              1205
188119  17460   2016-02-23  394846296        2301745              1205
188945  28392   2016-03-17  394846296        2301745              1205
189063  29988   2016-03-20  394846296        2301745              1205
189477  35004   2016-03-31  394846296        2301745              1205

So the object didn't change outside the for loop... Should I somehow be telling the Groupby Object to change instead of the group? Thanks so much if you're reading this!

Upvotes: 1

Views: 86

Answers (1)

Ben.T
Ben.T

Reputation: 29635

you can use apply instead of doing a loop for and assign the value to a new dataframe (or the same):

new_competitor_df = (competitor_df.sort_values(['history_date']).groupby(['item_id'])
                                  .apply(lambda df_g: (df_g.drop_duplicates(subset = 'history_date')
                                                           .set_index('history_date')
                                                           .asfreq('D',method='pad')))
                                  .reset_index(0,drop=True))

Then you can get all the data you want by doing for example:

print (new_competitor_df[new_competitor_df['item_id'] ==394846296].head())
                id    item_id  competitor_id  competitor_price
history_date                                                  
2016-01-25    3504  394846296        2301745              1205
2016-01-26    3504  394846296        2301745              1205
2016-01-27    3504  394846296        2301745              1205
2016-01-28    3504  394846296        2301745              1205
2016-01-29    3504  394846296        2301745              1205

or same result with print (new_competitor_df.groupby(['item_id']).get_group(394846296).head())

Upvotes: 1

Related Questions