user40
user40

Reputation: 1400

Plot frequency of dates in interval occurred in pandas dataframe

I have dataframe with dates from year 1970 to year 2018, I want to plot frequency of occurrences from year 2016 to 2017.

In[95]: df['last_payout'].dtypes
Out[95]: dtype('<M8[ns]')

The data is stored in this format:

In[96]: df['last_payout'].head
​​
Out[96]: <bound method NDFrame.head of 0         1970-01-01
1         1970-01-01
2         1970-01-01
3         1970-01-01
4         1970-01-01

I plot this by year using group by and count :

 In[97]: df['last_payout'].groupby(df['last_payout'].dt.year).count().plot(kind="bar")

enter image description here

I want to get this plot between specific dates, I tried to put df['last_payout'].dt.year > 2016, but I got this:

enter image description here

How do I get the plot for specific date range?

Upvotes: 1

Views: 2224

Answers (2)

jezrael
jezrael

Reputation: 862761

I think need filter by between and boolean indexing first:

rng = pd.date_range('2015-04-03', periods=10, freq='7M')
df = pd.DataFrame({'last_payout': rng})  
print (df)
  last_payout
0  2015-04-30
1  2015-11-30
2  2016-06-30
3  2017-01-31
4  2017-08-31
5  2018-03-31
6  2018-10-31
7  2019-05-31
8  2019-12-31
9  2020-07-31

(df.loc[df['last_payout'].dt.year.between(2016, 2017), 'last_payout']
       .groupby(df['last_payout'].dt.year)
       .count()
       .plot(kind="bar")
)

Alternative solution:

(df.loc[df['last_payout'].dt.year.between(2016, 2017), 'last_payout']
         .dt.year
         .value_counts()
         .sort_index()
         .plot(kind="bar")
)

graph

EDIT: For months with years convert datetimes to month period by to_period:

(df.loc[df['last_payout'].dt.year.between(2016, 2017), 'last_payout']
         .dt.to_period('M')
         .value_counts()
         .sort_index()
         .plot(kind="bar")
)

Upvotes: 2

Ami Tavory
Ami Tavory

Reputation: 76297

Note that

df['last_payout'].dt.year > 2016

just returns a boolean series, so plotting this will indeed show a bar chart of the number of dates for which this is or not.


Try first creating a relevant df:

relevant_df = df[(df['last_payout'].dt.year > 2016) & (df['last_payout'].dt.year <= 2017)]

(use strict or not inequalities depending on what you want, of course.)

then performing the plot on it:

relevant_df['last_payout'].groupby(relevant_df['last_payout'].dt.year).count().plot(kind="bar")

Upvotes: 1

Related Questions