Reputation: 1
I am trying to plot yearly data that is centred on the winter months (ie the data for each year starts in June and ends in June the following year). The dataframes look something like this:
Date/ Rate
1 2018-06-18 5.0
2 2018-06-25 7.1
3 2018-07-02 7.6
4 2018-07-09 4.3
5 2018-07-16 8.9
I would like to compare the variation between years, where I would plot the data for each year as a separate trace on the same graph (2017/18, 2018/19 etc), using only the day and month part of the date.
Understandably, plotting the data just gives a standard time series, where each trace flows into the next with the x axis extended over the number of years.
I saw fig.update_layout(xaxis_tickformat="%d-%B")
suggested elsewhere, but that only removes the year from the x axis and doesn't affect the how the data itself is plotted - the lines are not overlaid on each other.
As an alternative, I tried removing the year before plotting the data using df["date"] = df['date'].dt.strftime('%m/%d')
. Unfortunately, this changes the dates to a string and so they aren't plotted properly either.
Searching for a solution I did find this approach where the year is removed and then converted back into a date so that all the years are reset to 1900. A modified version of this where I have added a date increment so that the winter isn't being split works and gives me what I want, but it seems like an awfully messy way of doing things.
It feels like such a trivial issue, but the (elegant) solution is evading me!
EDIT:
This is the type of graph I am trying to create
Upvotes: 0
Views: 1568
Reputation: 35115
It may not be the best way. The method can be handled by specifying the first year for the x-axis and adding values for the next and subsequent years. The extraction conditions are looped through with the start and end as a list.
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
date_rng = pd.date_range('2015-06-01', '2021-06-01', freq='1d')
rating = np.random.randint(0,10,(2193,)) + np.random.rand(2193)
df = pd.DataFrame({'Date': pd.to_datetime(date_rng), 'Rate':rating})
df['year'] = df['Date'].dt.year
fig = go.Figure()
start = ['2015-06-01','2016-06-01','2017-06-01','2018-06-01','2020-06-01']
end = ['2016-06-01','2017-06-01','2018-06-01', '2019-06-01','2021-06-01']
years = df['year'].unique()
for idx, (s,e) in enumerate(zip(start, end)):
tmp = df[(df['Date'] >= start[idx]) & (df['Date'] <= end[idx])]
fig.add_trace(go.Scatter(x=date_rng[:-365],
y=tmp.Rate,
name=str(years[idx]),
mode='lines',
))
fig.update_layout(height=600, xaxis_tickformat='%d-%b')
fig.update_xaxes(type='date')
fig.show()
Upvotes: 1