Reputation: 9411

Storing large amount of data in Redis / NoSQL or Relational db?

I need to store and access financial market candle stick information.

The amount of candles sticks that I will need to store is beginning to looking staggering (huge). There are 1000s of markets and each one has many trading pairs, and each pair has many time frames, and each time frame is an array of candles like the below. The array below could be for hourly price data or daily price data for example.

I need to make this information available to multiple users at any given time, so need to store it and make it available somehow.

The data looks something like this:

[
    {
        time: 1528761600,
        openPrice: 100,
        closePrice: 20,
        highestPrice: 120,
        lowesetPrice:10 
    },
    {
        time: 1528761610,
        openPrice: 100,
        closePrice: 20,
        highestPrice: 120,
        lowesetPrice:10 
    },
    {
        time: 1528761630,
        openPrice: 100,
        closePrice: 20,
        highestPrice: 120,
        lowesetPrice:10 
    }
]

Consumers of the data will mostly be a complex Javascript based charting app, but other consumers will be node code, and perhaps other backend code.

My current best idea is to put save the candlesticks in Redis, though I have also considered a noSQL database. I'm not super experienced in either, so I'm not 100% sure Redis is the right choice. It seems to be the most performant option though, but perhaps harder to work with, since I am having to learn a lot, and I'm not convinced that the method of saving and retrieval used by Redis is going to make this very easy since, I will need to continually add candles to each array.

I'm currently thinking something like:

Do an initial fetch from the candle stick api and either:

Create a Redis hash with a suitable label and stingify the whole array of candles into the hash, so that it back be parsed by Javascript etc

Drawbacks of this approach:

Every time a new candle is created, I have to parse the json, add any new candles sticks and stringify and save it.

Pros of this approach:

I can use Javascript to manage the array and make sure it's sorted etc

Create a Redis list of time stamps, which allows me to just push new candles onto the list and trust it to be in the right order. I can then do a Redis SCAN? to return time stamps between the specific dates and then use the time stamps to pull the data out of a Redis hash. After retriveng all of this, then building a json object similar to above to pass to Javascript.

I have to say that both of these approaches feels way more painfull to me putting the data in a relational database. I imagine that a no-SQL database could also be way easier, but I'm not experienced with them, so I can't say for sure.

I'm a bit lost and out of my experience here, as you can tell, and would love any advice anyone can give me.

Thanks :)

Upvotes: 4

Answers (2)

Sripathi Krishnan

Reputation: 31538

Your data is very regular - each candlestick has essentially 1 64 bit long for timestamp, and 4 32 bit numbers for the prices. This makes it very amenable to bitfield.

Storing the data

Here is how I would store it -

stock-symbol:daily_prices = bitfield with 30 * 5 records, assuming you are storing data for past 30 days
stock-symbol:hourly_prices = bitfield with 24 * 5 records

This way, your memory is (30*5 + 24*5) * 16 bytes = 4320 bytes per symbol + constant overhead per key.

You don't need to store the timestamp (see below). Also, I have assumed 4 bytes to store the price. You can store it as a whole number by eliminating the decimal.

Writing the data

To insert hourly prices, find the current hour (say 07:00 hours). If you treat the bitfield as an array of 4 byte integers, you will have to skip 7 * 4 = 28 integers. You then insert the prices at position 28, 29, 30, 31 (0 based indexes).

So, to store price for AAPL at 07:00 hours, you would run the command

bitfield AAPL:hourly_prices set i32 28 <open price> i32 29 <close price> i32 30 <highest price> i32 31 <lowest price>

You would do something similar for daily prices as well.

Reading Data

If you are building a charting library, most likely you would want to return data for multiple symbols for a given time range. Let's say you want to pull out daily prices for past 7 days, your logic will be -

For each symbol:
1. Get start and end range within the array
2. Invoke the Get Range command.

If you run this in a pipeline, it will be very fast.

Other tips

Usually, you would to filter by some property of the symbol. For example, "show me graphs of top 10 tech companies for the last 5 days".

A symbol itself is relational data. I would recommend storing that in a relational database. Just get the symbol names as a list from the relational database, and then fetch the stock prices from redis.

Upvotes: 7

erik258

Reputation: 16302

Redis has its limits, like anything, but they're pretty high, and if you're clever about it, you can get amazing performance out of redis. If you outgrow one instance you can start thinking about clustering, which should scale relatively linearly to a level where budget is a bigger concern than performance.

Without having a really great grasp of the data you're describing and its relations, sounds like what you're looking for is a sorted set, perhaps sorted by date. You can ZSCAN a sorted set to move through it sequentially, or you can do lots of other great things against one as well. You might have data that requires a few different things - eg a hash for some data and an entry into an index for the hash itself, or even in a few different indexes. A simple redis list might also do the job for you, since it's inherently ordered by insertion order ( this may or may not work for your cases of course; it may depend on whether your input is inherently temporally ordered).

At the end of the day, redis performance is generally dictated by how "well" the data is stored in redis - in other words, how well the native redis capabilities have been mapped into your problem domain. It's pretty easy to use and to program against. I'd highly recommend you look into it.

Upvotes: 1