Reputation:
I would need to plot the frequency of items by date. My csv contains three columns: one for Date
, one for Name & Surname
and another one for Birthday
.
I am interested in plotting the frequency of people recorded in a date. My expected output would be:
Date Count
0 01/01/2018 9
1 01/02/2018 12
2 01/03/2018 6
3 01/04/2018 4
4 01/05/2018 5
.. ... ...
.. 02/27/2020 122
.. 02/28/2020 84
The table above was found as follows:
by_date = df.groupby(df['Date']).size().reset_index(name='Count')
Date
is a column in my csv file, but not Count
. This explains the reason why I am having difficulties to draw a line plot.
How can I plot the frequency as a list of numbers/column?
Upvotes: 1
Views: 161
Reputation: 93161
Although not absolutely required, you should convert the Date
column into Timestamp
for easier analysis in later steps:
df['Date'] = pd.to_datetime(df['Date'])
Now, to your question. To count many births there are per day, you can use value_counts
:
births = df['Date'].value_counts()
But you don't even have to do that for plotting a histogram! Use hist
:
import matplotlib.dates as mdates
year = mdates.YearLocator()
month = mdates.MonthLocator()
formatter = mdates.ConciseDateFormatter(year)
ax = df['Date'].hist()
ax.set_title('# of births')
ax.xaxis.set_major_locator(year)
ax.xaxis.set_minor_locator(month)
ax.xaxis.set_major_formatter(formatter)
Result (from random data):
Upvotes: 1