Reputation: 1400
I have dataframe with dates from year 1970 to year 2018, I want to plot frequency of occurrences from year 2016 to 2017.
In[95]: df['last_payout'].dtypes
Out[95]: dtype('<M8[ns]')
The data is stored in this format:
In[96]: df['last_payout'].head
Out[96]: <bound method NDFrame.head of 0 1970-01-01
1 1970-01-01
2 1970-01-01
3 1970-01-01
4 1970-01-01
I plot this by year using group by
and count
:
In[97]: df['last_payout'].groupby(df['last_payout'].dt.year).count().plot(kind="bar")
I want to get this plot between specific dates, I tried to put df['last_payout'].dt.year > 2016
, but I got this:
How do I get the plot for specific date range?
Upvotes: 1
Views: 2224
Reputation: 862761
I think need filter by between
and boolean indexing
first:
rng = pd.date_range('2015-04-03', periods=10, freq='7M')
df = pd.DataFrame({'last_payout': rng})
print (df)
last_payout
0 2015-04-30
1 2015-11-30
2 2016-06-30
3 2017-01-31
4 2017-08-31
5 2018-03-31
6 2018-10-31
7 2019-05-31
8 2019-12-31
9 2020-07-31
(df.loc[df['last_payout'].dt.year.between(2016, 2017), 'last_payout']
.groupby(df['last_payout'].dt.year)
.count()
.plot(kind="bar")
)
Alternative solution:
(df.loc[df['last_payout'].dt.year.between(2016, 2017), 'last_payout']
.dt.year
.value_counts()
.sort_index()
.plot(kind="bar")
)
EDIT: For months with years convert datetimes to month period by to_period
:
(df.loc[df['last_payout'].dt.year.between(2016, 2017), 'last_payout']
.dt.to_period('M')
.value_counts()
.sort_index()
.plot(kind="bar")
)
Upvotes: 2
Reputation: 76297
Note that
df['last_payout'].dt.year > 2016
just returns a boolean series, so plotting this will indeed show a bar chart of the number of dates for which this is or not.
Try first creating a relevant df:
relevant_df = df[(df['last_payout'].dt.year > 2016) & (df['last_payout'].dt.year <= 2017)]
(use strict or not inequalities depending on what you want, of course.)
then performing the plot on it:
relevant_df['last_payout'].groupby(relevant_df['last_payout'].dt.year).count().plot(kind="bar")
Upvotes: 1