Colin T Bowers
Colin T Bowers

Reputation: 18560

How to store data with a tree-like structure in Julia

I want to store high-frequency financial data in memory while I work with it in Julia.

My data is in lots of arrays of Float64. Each array stores high frequency data from a single day, for some security, on some market. For example, for the date 2010-01-04, for IBM, listed on the NYSE (New York Stock Exchange), there is one array of Float64.

As stated, I have many such arrays, spanning multiple dates, markets, and securities. I want to store them all in one object, such that it is easy to retrieve any given array (probably exploiting the tree-like structure of metadata).

In Matlab, I used to store this in a structure, where the first level is market, next level is security, next level is date, and then at the end of tree is the corresponding array. At each level I also stored a list of fields at that level.

Julia doesn't really have an equivalent to Matlab structures, so what is the best way to do this in Julia?

Currently, the best I can come up with is a sequence of nested composite types, each with two fields. For example:

type HighFrequencyData
    dateList::Array{Date, 1}
    dataArray::Array{Any, 1}
end

where dateList stores a list of dates that correspond to a sequence of arrays of Float64 held in dataArray (i.e. dateList and dataArray will have the same length). Then:

type securitiesData
    securityList::Array{String, 1}
    highFrequencyArray::Array{Any, 1}
end

where securityList stores a list of securities that correspond to a sequence of type HighFrequencyData held in highFrequencyArray. Then:

type marketsData
    marketList::Array{String, 1}
    securitiesArray::Array{Any, 1}
end

where marketList stores a list of markets that correspond to a sequence of type securitiesData held in securitiesArray.

Given this, all data can now be stored in a variable of type marketsData, and looked up using marketList, securityList, and dateList, at each level of nesting.

But this feels a bit cumbersome...

Upvotes: 3

Views: 2268

Answers (1)

IainDunning
IainDunning

Reputation: 11664

Your type hierarchy looks ok, but maybe dictionaries are all you need?

all_data = ["Market1" => {
             ["Sec1" => {[20140827, 20140825], [1.05, 10.6]}],
             ["Sec2" => {[20140827, 20140825], [1.05, 10.6]}]},
            "Market2" => {
             ["Sec1" => {[20140827, 20140825], [1.05, 10.6]}],
             ["Sec2" => {[20140827, 20140825], [1.05, 10.6]}]},
            ...]

println(all_data["Market1"]["Sec1"] ./ all_data["Market2"]["Sec1"])

If you could post what the MATLAB code looks like that might be helpful too.

I would reformulate your types a little bit, maybe something simpler like

type TimeSeries
    dates::Vector{Date}
    data::Vector{Any}
end

typealias Security (String,TimeSeries)
typealias Market Vector{Security}

markets = Market[]

push!(markets, [("Sec1",TimeSeries(...)), ("Sec2",TimeSeries(...)])

Also, make sure to check out https://github.com/JuliaStats/TimeSeries.jl

Upvotes: 5

Related Questions