tomferrari
tomferrari

Reputation: 53

Python - get the mean of value between 2 date

I'd like to get the mean of value between 2 dates grouped by shop.

In fact I've a first xlsx with the sells by shop and date

shop sell date
a    100  2000
a    122  2001
a    300  2002
b    55   2000
b    245  2001
b    1156 2002

And I've another file with the start and end date for each shop

shop start stop
a    2000  2002
a    2000  2001
b    2000  2000

And so I'd like to get the sell mean between each date from the 2nd file.

I try something like this but I got a list of Df and it's not pretty optimize for me

dfend = []

for i in df2.values:
    filt1 = df.shop == i[0]
    filt2 = df.date >= i[1]
    filt3 = df.date <= i[2]
    dfgrouped = df.where(filt1 & filt2 & filt3).groupby('shop').agg(mean = ('sell','mean'), begin = ('date','min'), end = ('date', 'max'))
    dfend.append(dfgrouped)

Someone can help me ?

Thx a lot

Upvotes: 0

Views: 62

Answers (1)

ALollz
ALollz

Reputation: 59579

merge the two DataFrames on 'shop'. Then you can check the date condition using between to filter down to the rows that count. Finally groupby + sum. (This assumes your second df is unique)

m = df2.merge(df1, how='left')

(m[m['date'].between(m['start'], m['stop'])]
   .groupby(['shop', 'start', 'stop'])['sell'].mean()
   .reset_index())

#  shop  start  stop  sell
#0    a   2000  2001   111
#1    a   2000  2002   174
#2    b   2000  2000    55

If there are some rows in df2 that will have no qualifying rows in df1, then instead use mask so that they still get a row after the groupby (this is also the reason why df2 is the left DataFrame in the merge). Here I added an extra row

print(df2)
#  shop  start  stop
#0    a   2000  2002
#1    a   2000  2001
#2    b   2000  2000
#3    e   1999  2011

m = df2.merge(df1, how='left')

(m.where(m['date'].between(m['start'], m['stop']))
  .groupby([m.shop, m.start, m.stop])['sell'].mean()
  .reset_index())

#  shop  start  stop   sell
#0    a   2000  2001  111.0
#1    a   2000  2002  174.0
#2    b   2000  2000   55.0
#3    e   1999  2011    NaN

Upvotes: 1

Related Questions