Reputation: 53
I'd like to get the mean of value between 2 dates grouped by shop.
In fact I've a first xlsx with the sells by shop and date
shop sell date
a 100 2000
a 122 2001
a 300 2002
b 55 2000
b 245 2001
b 1156 2002
And I've another file with the start and end date for each shop
shop start stop
a 2000 2002
a 2000 2001
b 2000 2000
And so I'd like to get the sell mean between each date from the 2nd file.
I try something like this but I got a list of Df and it's not pretty optimize for me
dfend = []
for i in df2.values:
filt1 = df.shop == i[0]
filt2 = df.date >= i[1]
filt3 = df.date <= i[2]
dfgrouped = df.where(filt1 & filt2 & filt3).groupby('shop').agg(mean = ('sell','mean'), begin = ('date','min'), end = ('date', 'max'))
dfend.append(dfgrouped)
Someone can help me ?
Thx a lot
Upvotes: 0
Views: 62
Reputation: 59579
merge
the two DataFrames on 'shop'. Then you can check the date condition using between
to filter down to the rows that count. Finally groupby
+ sum
. (This assumes your second df is unique)
m = df2.merge(df1, how='left')
(m[m['date'].between(m['start'], m['stop'])]
.groupby(['shop', 'start', 'stop'])['sell'].mean()
.reset_index())
# shop start stop sell
#0 a 2000 2001 111
#1 a 2000 2002 174
#2 b 2000 2000 55
If there are some rows in df2
that will have no qualifying rows in df1
, then instead use mask
so that they still get a row after the groupby
(this is also the reason why df2
is the left DataFrame in the merge). Here I added an extra row
print(df2)
# shop start stop
#0 a 2000 2002
#1 a 2000 2001
#2 b 2000 2000
#3 e 1999 2011
m = df2.merge(df1, how='left')
(m.where(m['date'].between(m['start'], m['stop']))
.groupby([m.shop, m.start, m.stop])['sell'].mean()
.reset_index())
# shop start stop sell
#0 a 2000 2001 111.0
#1 a 2000 2002 174.0
#2 b 2000 2000 55.0
#3 e 1999 2011 NaN
Upvotes: 1