Adrian Florea
Adrian Florea

Reputation: 135

Best storage option for temporary data

In the context of an application I am developing I need to store quite a good amount of data for limited amount of time - think of the scenario in terms of object buckets - for a certain limited amount of time objects are in some buckets then they move to others and others... and so on. There is basically a n to m relation between objects (order of millions) and buckets (thousands to maybe tens of thousands)

It is important for this storage layer to be persistent so that, in case of app / server failure, I can recreate the last state.

What would the best option be for implementing such temporary storage

Thanks

Upvotes: 4

Views: 5476

Answers (2)

Zim-Zam O'Pootertoot
Zim-Zam O'Pootertoot

Reputation: 18148

If an object can only belong to one bucket at a time then this should fit well with Postgres - you'd just need a buckets table with a set of unique bucket identifiers, and then the objects' tables would have a currentbucket column to indicate the bucket to which they currently belong.

If an object can belong to an unbounded number of buckets then you can still use Postgres, but you'll need to remove the currentbucket column from the objects' tables and instead have a bucketobjectjoin table with a column for bucket identifier and a column for object identifier. Since you're already using Postgres I'd recommend implementing it this way as a first pass. If you're not happy with the performance then you can cache the bucketobjectjoin table in Redis (as a Set of bucket identifiers keyed to an object identifier, and/or as a Set of object identifiers keyed to a bucket identifier) - you're only storing the objects' keys (not the full objects) in Redis so memory shouldn't be an issue, and you can have a background task occasionally sync Redis with Postgres's bucketobjectjoin table in case the Redis server crashes.

As a full blown nosql approach you can use Cassandra to store the full objects; Cassandra supports Sets like Redis does, but doesn't have Redis's memory restrictions.

Upvotes: 3

Liviu Costea
Liviu Costea

Reputation: 3784

We used to keep temporary data in our SQL Server, but then we moved to memcached and we could see big performance improvements - mainly because that data is in memory and not on disk. And starting from last year we are only using Redis, which has some more advantages, like more types supported or no need to break those objects due to size limitation.

So if your data is big and changes a lot and it is also temporary then I suggest redis. And you can have a redis cluster of 1 master and 2 slaves and that will provide you the HA you are looking for.

Upvotes: 1

Related Questions