Z Awang
Z Awang

Reputation: 51

how to create a high resolution timeseries plot in Julia

I have successfully created using timeseries.jl, with Version 1.5.3 (2020-11-09), in Juno, installed with JuliaPro, with the following code

ATTEMPT 1:

using IterableTables
using DataFrames
using CSV
using Dates
using TimeSeries
using Plots


myfile="test2.csv"
dmft = dateformat"d/m/yyyy HH:MM:SS"
df = DataFrame(CSV.File(joinpath(@__DIR__,myfile); dateformat=dmft))
println(first(df,10))

ta = TimeArray(df; timestamp = :Date)
println(colnames(ta))
display(plot(ta[:Col3]))

And obtained this plot

timeseries plot in Juno with the following output in my REPL

10×5 DataFrame
│ Row │ Date                │ Col1    │ Col2    │ Col3    │ Col4    │
│     │ DateTime            │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼─────────────────────┼─────────┼─────────┼─────────┼─────────┤
│ 1   │ 2020-08-10T00:00:00 │ 507.28  │ 181.34  │ 1532.96 │ 183.16  │
│ 2   │ 2020-08-10T00:01:00 │ 507.29  │ 181.34  │ 1532.95 │ 183.16  │
│ 3   │ 2020-08-10T00:02:00 │ 507.27  │ 181.34  │ 1532.94 │ 183.16  │
│ 4   │ 2020-08-10T00:03:00 │ 507.28  │ 181.34  │ 1532.97 │ 183.16  │
│ 5   │ 2020-08-10T00:04:00 │ 507.29  │ 181.33  │ 1532.97 │ 183.16  │
│ 6   │ 2020-08-10T00:05:00 │ 507.29  │ 181.33  │ 1532.96 │ 183.16  │
│ 7   │ 2020-08-10T00:06:00 │ 507.27  │ 181.33  │ 1532.95 │ 183.16  │
│ 8   │ 2020-08-10T00:07:00 │ 507.28  │ 181.33  │ 1532.96 │ 183.16  │
│ 9   │ 2020-08-10T00:08:00 │ 507.27  │ 181.33  │ 1532.95 │ 183.16  │
│ 10  │ 2020-08-10T00:09:00 │ 507.28  │ 181.32  │ 1532.96 │ 183.16  │
[:Col1, :Col2, :Col3, :Col4]

unfortunately, it came out as an image where if I zoom the resolution is not high as can be seen below.

zoomed in image

WHAT I LIKE TO ACHIEVE:

Ideally, I would prefer a high resolution image as below which i can zoomed in properly using Shift and left mouse button.

enter image description here

the dataframe for the above image looks like below.

julia> print(first(mydf2,10))
10×8 DataFrame
│ Row │ ticker │ timestamp  │ Open    │ High    │ Low     │ Close   │ AdjClose │ Volume    │
│     │ String │ Date       │ Float64 │ Float64 │ Float64 │ Float64 │ Float64  │ Float64   │
├─────┼────────┼────────────┼─────────┼─────────┼─────────┼─────────┼──────────┼───────────┤
│ 1   │ MSFT   │ 2010-12-27 │ 28.12   │ 28.2    │ 27.88   │ 28.07   │ 22.3176  │ 2.16528e7 │
│ 2   │ MSFT   │ 2010-12-28 │ 27.97   │ 28.17   │ 27.96   │ 28.01   │ 22.2699  │ 2.30422e7 │
│ 3   │ MSFT   │ 2010-12-29 │ 27.94   │ 28.12   │ 27.88   │ 27.97   │ 22.2381  │ 1.95025e7 │
│ 4   │ MSFT   │ 2010-12-30 │ 27.92   │ 28.0    │ 27.78   │ 27.85   │ 22.1427  │ 2.07861e7 │
│ 5   │ MSFT   │ 2010-12-31 │ 27.8    │ 27.92   │ 27.63   │ 27.91   │ 22.1904  │ 2.4752e7  │
│ 6   │ MSFT   │ 2011-01-03 │ 28.05   │ 28.18   │ 27.92   │ 27.98   │ 22.2461  │ 5.34438e7 │
│ 7   │ MSFT   │ 2011-01-04 │ 27.94   │ 28.17   │ 27.85   │ 28.09   │ 22.3335  │ 5.44056e7 │
│ 8   │ MSFT   │ 2011-01-05 │ 27.9    │ 28.01   │ 27.77   │ 28.0    │ 22.262   │ 5.89987e7 │
│ 9   │ MSFT   │ 2011-01-06 │ 28.04   │ 28.85   │ 27.86   │ 28.82   │ 22.9139  │ 8.80263e7 │
│ 10  │ MSFT   │ 2011-01-07 │ 28.64   │ 28.74   │ 28.25   │ 28.6    │ 22.739   │ 7.3762e7  │

using data from MarketData.jl with the following code to plot:

using Gadfly
display(plot(mydf2,x="timestamp",y="AdjClose", Geom.line))

ATTEMPT 2:

I tried with my first dataseries to achieve similar results, just ignoring the TimeArray (since it didnt help in Attempt 1), and got the following error

myfile="test2.csv"
df = DataFrame(CSV.File(joinpath(@__DIR__,myfile)))
println(first(df,10))
display(plot(df,x="Date",y="Col3", Geom.line))

I got the following dataframe and error message:

    10×5 DataFrame
│ Row │ Date                │ Col1    │ Col2    │ Col3    │ Col4    │
│     │ DateTime            │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼─────────────────────┼─────────┼─────────┼─────────┼─────────┤
│ 1   │ 2020-08-10T00:00:00 │ 507.28  │ 181.34  │ 1532.96 │ 183.16  │
│ 2   │ 2020-08-10T00:01:00 │ 507.29  │ 181.34  │ 1532.95 │ 183.16  │
│ 3   │ 2020-08-10T00:02:00 │ 507.27  │ 181.34  │ 1532.94 │ 183.16  │
│ 4   │ 2020-08-10T00:03:00 │ 507.28  │ 181.34  │ 1532.97 │ 183.16  │
│ 5   │ 2020-08-10T00:04:00 │ 507.29  │ 181.33  │ 1532.97 │ 183.16  │
│ 6   │ 2020-08-10T00:05:00 │ 507.29  │ 181.33  │ 1532.96 │ 183.16  │
│ 7   │ 2020-08-10T00:06:00 │ 507.27  │ 181.33  │ 1532.95 │ 183.16  │
│ 8   │ 2020-08-10T00:07:00 │ 507.28  │ 181.33  │ 1532.96 │ 183.16  │
│ 9   │ 2020-08-10T00:08:00 │ 507.27  │ 181.33  │ 1532.95 │ 183.16  │
│ 10  │ 2020-08-10T00:09:00 │ 507.28  │ 181.32  │ 1532.96 │ 183.16  │
ERROR: LoadError: Cannot convert DataFrame to series data for plotting

ATTEMPT 3:

Since it is in DateTime format, I wonder why that is an issue. Ok so I tried something different now, not changing the format when loading the data, and still not using the TimeArray:

myfile="test2.csv"
# dmft = dateformat"d/m/yyyy HH:MM:SS"
df = DataFrame(CSV.File(joinpath(@__DIR__,myfile))) # dateformat=dmft removed
println(first(df,10))

display(plot(df,x="Date",y="Col3", Geom.line))

but I still got this result:

10×5 DataFrame
│ Row │ Date           │ Col1    │ Col2    │ Col3    │ Col4    │
│     │ String         │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────────────┼─────────┼─────────┼─────────┼─────────┤
│ 1   │ 10/8/2020 0:00 │ 507.28  │ 181.34  │ 1532.96 │ 183.16  │
│ 2   │ 10/8/2020 0:01 │ 507.29  │ 181.34  │ 1532.95 │ 183.16  │
│ 3   │ 10/8/2020 0:02 │ 507.27  │ 181.34  │ 1532.94 │ 183.16  │
│ 4   │ 10/8/2020 0:03 │ 507.28  │ 181.34  │ 1532.97 │ 183.16  │
│ 5   │ 10/8/2020 0:04 │ 507.29  │ 181.33  │ 1532.97 │ 183.16  │
│ 6   │ 10/8/2020 0:05 │ 507.29  │ 181.33  │ 1532.96 │ 183.16  │
│ 7   │ 10/8/2020 0:06 │ 507.27  │ 181.33  │ 1532.95 │ 183.16  │
│ 8   │ 10/8/2020 0:07 │ 507.28  │ 181.33  │ 1532.96 │ 183.16  │
│ 9   │ 10/8/2020 0:08 │ 507.27  │ 181.33  │ 1532.95 │ 183.16  │
│ 10  │ 10/8/2020 0:09 │ 507.28  │ 181.32  │ 1532.96 │ 183.16  │
ERROR: LoadError: Cannot convert DataFrame to series data for plotting

I suspect the issue is with the Date or DateTime, but I haven't been able to nail it down. There was a post on plotting the time series data, but using String instead. Gadfly.jl : How to plot date time based? resulting in my attempt below:

ATTEMPT 4:

myfile="test2.csv"
dmft = dateformat"d/m/yyyy HH:MM:SS"
df = DataFrame(CSV.File(joinpath(@__DIR__,myfile); dateformat=dmft)) # historical data for the ticker

dt = Array(df.Date)
dt_str = Array(String,length(dt))
for i=1:length(dt)
    dt_str[i] = string(dt[i]);
end

with the following error message:

ERROR: LoadError: MethodError: no method matching Array(::Type{String}, ::Int64)

This is a small snippet of my csv, in case you want to try it out.

Date,Col1,Col2,Col3,Col4
10/8/2020 0:00,507.28,181.34,1532.96,183.16
10/8/2020 0:01,507.29,181.34,1532.95,183.16
10/8/2020 0:02,507.27,181.34,1532.94,183.16
10/8/2020 0:03,507.28,181.34,1532.97,183.16
10/8/2020 0:04,507.29,181.33,1532.97,183.16
10/8/2020 0:05,507.29,181.33,1532.96,183.16
10/8/2020 0:06,507.27,181.33,1532.95,183.16
10/8/2020 0:07,507.28,181.33,1532.96,183.16
10/8/2020 0:08,507.27,181.33,1532.95,183.16
10/8/2020 0:09,507.28,181.32,1532.96,183.16
10/8/2020 0:10,507.29,181.32,1532.97,183.16
10/8/2020 0:11,507.28,181.33,1532.94,183.16
10/8/2020 0:12,507.27,181.33,1532.96,183.16
10/8/2020 0:13,507.31,181.33,1532.96,183.17

I am a newcomer to Julia, any beginner's level guide is most appreciated.

EDIT: The issue here is that the plot is rendered as image. I did the svg and this is what I get. Not very appealing right? All the high resolution data got clustered.

enter image description here

Once it is rendered as an image which is what TimeSeries.jl does as opposed to plotting using plotly or gladfly (or whetver other backend engines), then I lose the ability to zoom into the plot.

As long as it is high resolution and not rendered as an image, I am fine whether it is plotly or gladfly or others.

Yes, the pots is long. If that doesn't help, just ignore my codes then. At the end of the post, I have supplied a short csv if anyone doesn’t mind showing me how it is supposed to be done correctly. Here it is again.

Date,Col1,Col2,Col3,Col4
10/8/2020 0:00,507.28,181.34,1532.96,183.16
10/8/2020 0:01,507.29,181.34,1532.95,183.16
10/8/2020 0:02,507.27,181.34,1532.94,183.16
10/8/2020 0:03,507.28,181.34,1532.97,183.16
10/8/2020 0:04,507.29,181.33,1532.97,183.16
10/8/2020 0:05,507.29,181.33,1532.96,183.16
10/8/2020 0:06,507.27,181.33,1532.95,183.16
10/8/2020 0:07,507.28,181.33,1532.96,183.16
10/8/2020 0:08,507.27,181.33,1532.95,183.16
10/8/2020 0:09,507.28,181.32,1532.96,183.16
10/8/2020 0:10,507.29,181.32,1532.97,183.16
10/8/2020 0:11,507.28,181.33,1532.94,183.16
10/8/2020 0:12,507.27,181.33,1532.96,183.16
10/8/2020 0:13,507.31,181.33,1532.96,183.17

Upvotes: 3

Views: 790

Answers (1)

Z Awang
Z Awang

Reputation: 51

This one seems to work

Apparently, if it is in String, it will still work. Not sure why I didn't try this yesterday

myfile="test2.csv"
dmft = dateformat"d/m/yyyy HH:MM:SS"
df = DataFrame(CSV.File(joinpath(@__DIR__,myfile)))
println(first(df,10))
using Gadfly
display(plot(df2, x="Date", y="Col3", Guide.xticks(label=false), Geom.line, Theme(grid_line_width=0mm)))

I have tried with plotly, it works better. I went down the rabbit hole by a post where it says the DateTime must be in String. That is not true.

using IterableTables
using DataFrames
using CSV
using Dates
using Plots
myfile="test2.csv"
dmft = dateformat"d/m/yyyy HH:MM:SS"
df = DataFrame(CSV.File(joinpath(@__DIR__,myfile); dateformat=dmft))
println(first(df,10))
df2 = filter(row -> row[:Date] <= Dates.DateTime("2020-10-15T00:06:00"), df)
plotly()
using StatsPlots
@df df plot(:Date, :Col3)

Plotly in Juno

Upvotes: 2

Related Questions