pandas groupby with condition depending on same column

Question

I have a dataframe with ID, date and number columns and would like to create a new column that takes the mean of all numbers for this specific ID BUT only includes the numbers in the mean where date is smaller than the date of this row. How would I do this?

df = (pd.DataFrame({'ID':['1','1','1','1','2','2'],'number':['1','4','1','4','2','5'],
               'date':['2021-10-19','2021-10-16','2021-10-16','2021-10-15','2021-10-19','2021-10-10']})
      .assign(date = lambda x: pd.to_datetime(x.date))
      .assign(mean_no_from_previous_dts = lambda x: x[x.date


this is what i would like to get as output
    ID  number  date    mean_no_from_previous_dts
0   1   1   2021-10-19  3.0 = mean(4+1+4)
1   1   4   2021-10-16  2.5 = mean(4+1)
2   1   1   2021-10-16  4.0 = mean(1)
3   1   4   2021-10-15  0.0 = 0 (as it's the first entry for this date and ID - this number doesnt matter, can e something else)
4   2   2   2021-10-19  5.0 = mean(5)
5   2   5   2021-10-10  0.0 = 0 (as it's the first entry for this date and ID)

so for example the first entry of the column mean_no_from_previous_dts is the mean of (4+1+4): the first 4 comes from the column number and the 2nd row because 2021-10-16 (date in the 2nd row) is smaller than 2021-10-19 (date in the 1st row). The 1 comes from the 3rd row because 2021-10-16 is smaller than 2021-10-19. The second 4 comes from the 4th row because 2021-10-15 is smaller than 2021-10-19. This is for ID = 1 the the same for ID = 2

pandas groupby with condition depending on same column

Answers (1)

Related Questions