Reputation: 145
I'm trying to remove the weekend gaps from this time series plot. The x-axis is a data time stamp. I've tried the code on this site, but can't get it to work. See sample file used
The data looks like this
+-----------------------+---------------------+-------------+-------------+
| asof | INSERTED_TIME | DATA_SOURCE | PRICE |
+-----------------------+---------------------+-------------+-------------+
| 2020-06-17 00:00:00 | 2020-06-17 12:00:15 | DB | 170.4261757 |
+-----------------------+---------------------+-------------+-------------+
| 2020-06-17 00:00:00 | 2020-06-17 12:06:10 | DB | 168.9348656 |
+-----------------------+---------------------+-------------+-------------+
| 2020-06-17 00:00:00 | 2020-06-17 12:06:29 | DB | 168.8412129 |
+-----------------------+---------------------+-------------+-------------+
| 2020-06-17 00:00:00 | 2020-06-17 12:07:27 | DB | 169.878796 |
+-----------------------+---------------------+-------------+-------------+
| 2020-06-17 00:00:00 | 2020-06-17 12:10:28 | DB | 169.3685879 |
+-----------------------+---------------------+-------------+-------------+
| 2020-06-17 00:00:00 | 2020-06-17 12:12:14 | DB | 169.0787045 |
+-----------------------+---------------------+-------------+-------------+
| 2020-06-17 00:00:00 | 2020-06-17 12:12:33 | DB | 169.7561092 |
+-----------------------+---------------------+-------------+-------------+
Plot including weekend breaks
Using the line function I'm getting the plot below, with straight lines going from Friday end of day to Monday morning. Using px.scatter, I don't get the line, but I still get the gap.
import plotly.express as px
import pandas as pd
sampledf = pd.read_excel('sample.xlsx')
fig_sample = px.line(sampledf, x = 'INSERTED_TIME', y= 'PRICE', color = 'DATA_SOURCE')
fig_sample.show()
Attempt with no weekend breaks
fig_sample = px.line(sampledf, x = 'INSERTED_TIME', y= 'PRICE', color = 'DATA_SOURCE')
fig_sample.update_xaxes(
rangebreaks=[
dict(bounds=["sat", "mon"]) #hide weekends
]
)
fig_sample.show()
Using rangebreaks results in a blank plot.
Any help is appreciated. Thanks
Upvotes: 5
Views: 7100
Reputation: 129
You can also use render_mode='svg'
on px.line
import plotly.express as px
import pandas as pd
sampledf = pd.read_excel('sample.xlsx')
fig_sample = px.line(sampledf, x = 'INSERTED_TIME', y= 'PRICE', color = 'DATA_SOURCE', render_mode='svg')
fig_sample.update_xaxes(
rangebreaks=[
dict(bounds=["sat", "mon"])]
)
fig_sample.show()
However, for px.timeline
or other px.object
that don't have render_mode
you should use :
dict(pattern = "hour", dvalue = 60*60*1000,values = start_of_break)
start_of_break is a list date of every break you want. dvalue is the duration of each break. Here 60 minutes * 60 seconds * 1000 ms.
Upvotes: 1
Reputation: 145
There is a limitation of 1000 rows when using rangebreaks
When working with more than 1000 rows, add the parameter render_mode='svg'
In the code below I've used the scatter
function, but as you can see the large weekend gaps are not longer there. Additionally I've excluded the times between 11PM and 11AM
sampledf = pd.read_excel('sample.xlsx')
fig_sample = px.scatter(sampledf, x = 'INSERTED_TIME', y= 'PRICE', color = 'DATA_SOURCE', render_mode='svg')
fig_sample.update_xaxes(
rangebreaks=[
{ 'pattern': 'day of week', 'bounds': [6, 1]}
{ 'pattern': 'hour', 'bounds':[23,11]}
]
)
fig_sample.show()
The values in the plot are different from the original data set, but will work with the data in the original post. Found help here
Upvotes: 6
Reputation: 2821
Looks like the x axis on the blank plot does not even have the right range, since it begins in a different year. It's hard to explain the behavior without looking at the exact data input, but you can start with a working, simpler, dataset and try to check for differences (try to plot a filtered version of the data with select points or check for differences in the dtypes
of the DataFrame, etc).
You will see the expected behavior with a simpler dataset:
import plotly.express as px
import pandas as pd
from datetime import datetime
d = {'col1': [datetime(2020, 5, d) for d in range(1, 30)],
'col2': [d if (d + 3) % 7 not in (5, 6) else 0 for d in range(1, 30)]}
df = pd.DataFrame(data=d)
df.set_index('col1')
df_weekdays = df[df['col1'].dt.dayofweek.isin([0,1,2,3,4])]
f = px.line(df, x='col1', y='col2')
f.update_xaxes(
rangebreaks=[
dict(bounds=["sat", "mon"]), #hide weekends
]
)
f.show()
For the DataFrame without weekends, df_weekdays
, it's a similar image:
Upvotes: 0