Reputation: 3889
Suppose my data look as follows:
import datetime
import pandas as pd
df = pd.DataFrame({'datetime': [datetime.datetime(2024, 11, 27, 0), datetime.datetime(2024, 11, 27, 1), datetime.datetime(2024, 11, 28, 0),
datetime.datetime(2024, 11, 28, 1), datetime.datetime(2024, 11, 28, 2)],
'product': ['Apple', 'Banana', 'Banana', 'Apple', 'Banana']})
datetime product
0 2024-11-27 00:00:00 Apple
1 2024-11-27 01:00:00 Banana
2 2024-11-28 00:00:00 Banana
3 2024-11-28 01:00:00 Apple
4 2024-11-28 02:00:00 Banana
All I want is to plot the relative frequencies of the products sold at each day. In this example 1/2 (50%) of apples and 1/2 of bananas on day 2024-11-27. And 1/3 apples and 2/3 bananas on day 2024-11-28
What I managed to do:
absolute_frequencies = df.groupby([pd.Grouper(key='datetime', freq='D'), 'product']).size().reset_index(name='count')
total_counts = absolute_frequencies.groupby('datetime')['count'].transform('sum')
absolute_frequencies['relative_frequency'] = absolute_frequencies['count'] / total_counts
absolute_frequencies.pivot(index='datetime', columns='product', values='relative_frequency').plot()
I am pretty confident, there is a much less complicated way, since for the absolute frequencies I simply can use:
df.groupby([pd.Grouper(key='datetime', freq='D'), 'product']).size().unstack('product').plot(kind='line')
Upvotes: 2
Views: 64
Reputation: 3490
1.Group by day and product
2.Counts the number of occurrences of each product per day
3.Normalizes the counts per day, i.e., converts them to relative frequencies by dividing by the sum of counts per day.
4.Converts the product column into separate columns for each product.
import datetime
import pandas as pd
df = pd.DataFrame({'datetime': [datetime.datetime(2024, 11, 27, 0), datetime.datetime(2024, 11, 27, 1), datetime.datetime(2024, 11, 28, 0),
datetime.datetime(2024, 11, 28, 1), datetime.datetime(2024, 11, 28, 2)],
'product': ['Apple', 'Banana', 'Banana', 'Apple', 'Banana']})
relative_frequencies = df.groupby([pd.Grouper(key='datetime', freq='D'), 'product']) \
.size() \
.groupby(level=0) \
.apply(lambda x: x / x.sum()) \
.unstack('product')
print(relative_frequencies)
ax = relative_frequencies.plot.bar(rot=45, figsize=(10, 6))
date_labels = [x.strftime('%b %d') for x in relative_frequencies.index.get_level_values(0)]
ax.set_xticklabels(date_labels, rotation=45)
# Optional: add gridlines to improve readability
ax.grid(True)
ax.set_title('Relative Frequencies of Products Sold per Day')
ax.set_xlabel('Date')
ax.set_ylabel('Relative Frequency')
plt.tight_layout()
plt.show()
Output
product Apple Banana
datetime
2024-11-27 0.500000 0.500000
2024-11-28 0.333333 0.666667
Edited plot as per comment
Upvotes: 0