Reputation: 65
For each day, I have different colors which are used. I need to create a new column which looks at the ratio of users to the first day for each color.
For example, for color blue, the earliest date is 2020-01-01
and the corresponding 'users' value is 100. So, for 2020-01-02
, the value I want should be 102/100 = 1.02
raw_data = [
{'date': '2020-01-01', 'color': 'blue', 'users': 100},
{'date': '2020-01-02', 'color': 'blue', 'users': 102},
{'date': '2020-01-03', 'color': 'blue', 'users': 104},
{'date': '2020-01-04', 'color': 'blue', 'users': 98},
{'date': '2020-01-02', 'color': 'red', 'users': 100},
{'date': '2020-01-03', 'color': 'red', 'users': 107},
{'date': '2020-01-04', 'color': 'red', 'users': 114},
{'date': '2020-01-05', 'color': 'red', 'users': 150},
]
So far, I know I can get the min date for each color using the code below, but not sure what to do next
grouped = df.groupby('color')['date']
min = grouped.min()
Upvotes: 2
Views: 48
Reputation: 71707
sort
the values of the dataframe on Date
then groupby
on color
and transform the column users
using first
, then use Series.div
to divide users
by this transformed column to get the ratio:
df['date'] = pd.to_datetime(df['date'])
df['ratio'] = df['users'].div(
df.sort_values('date').groupby('color')['users'].transform('first')
)
date color users ratio
0 2020-01-01 blue 100 1.00
1 2020-01-02 blue 102 1.02
2 2020-01-03 blue 104 1.04
3 2020-01-04 blue 98 0.98
4 2020-01-02 red 100 1.00
5 2020-01-03 red 107 1.07
6 2020-01-04 red 114 1.14
7 2020-01-05 red 150 1.50
Upvotes: 3