Reputation: 591
I'm trying to resample a group in a Pandas object. The resampling works, but somehow the object isn't modified... Do I need to create a new group or something?
This is my code:
grouped_by_product_comp = competitor_df.sort_values(['history_date']).groupby(['item_id'])
for name, group in grouped_by_product_comp:
my_prod = name
group = group.drop_duplicates(subset = 'history_date')
group.set_index('history_date', inplace = True)
group = group.asfreq('D',method='pad')
print(group.head())
break
my_group = grouped_by_product_comp.get_group(394846296)
print(my_group.head())
And this is my output:
id item_id competitor_id competitor_price
history_date
2016-01-25 3504 394846296 2301745 1205
2016-01-26 3504 394846296 2301745 1205
2016-01-27 3504 394846296 2301745 1205
2016-01-28 3504 394846296 2301745 1205
2016-01-29 3504 394846296 2301745 1205
id history_date item_id competitor_id competitor_price
187116 3504 2016-01-25 394846296 2301745 1205
188119 17460 2016-02-23 394846296 2301745 1205
188945 28392 2016-03-17 394846296 2301745 1205
189063 29988 2016-03-20 394846296 2301745 1205
189477 35004 2016-03-31 394846296 2301745 1205
So the object didn't change outside the for loop... Should I somehow be telling the Groupby Object to change instead of the group? Thanks so much if you're reading this!
Upvotes: 1
Views: 86
Reputation: 29635
you can use apply
instead of doing a loop for
and assign the value to a new dataframe (or the same):
new_competitor_df = (competitor_df.sort_values(['history_date']).groupby(['item_id'])
.apply(lambda df_g: (df_g.drop_duplicates(subset = 'history_date')
.set_index('history_date')
.asfreq('D',method='pad')))
.reset_index(0,drop=True))
Then you can get all the data you want by doing for example:
print (new_competitor_df[new_competitor_df['item_id'] ==394846296].head())
id item_id competitor_id competitor_price
history_date
2016-01-25 3504 394846296 2301745 1205
2016-01-26 3504 394846296 2301745 1205
2016-01-27 3504 394846296 2301745 1205
2016-01-28 3504 394846296 2301745 1205
2016-01-29 3504 394846296 2301745 1205
or same result with print (new_competitor_df.groupby(['item_id']).get_group(394846296).head())
Upvotes: 1