simbo1905
simbo1905

Reputation: 6862

does Scylladb require as much memory as the data set?

Looking at how Scylla is described on compose.com it says:

Scylla requires fast IO and as much RAM as the total data size.

Yet looking at the published architecture it would seem that it doesn't need as much RAM as the total data size as it flushes to disk:

Scylla persists data on disk. Writes to Scylla are initially accumulated in RAM in memtables, which at some point get flushed to an sstable on disk and removed from RAM.

It seems traditional for projects that are touting extreme performance to not mention any tweaks needed to get that performance (e.g. sacrifice data safety by turning off those features for benchmarks, or not mentioning that you have to fit everything in RAM to achieve the published results).

I am wondering is it the case that everything does need to fit into ram, or that its the case you only get the benchmark results when it all fits in RAM, or that compose.com is simply wrong (or out of date)?

Unfortunately googling the question doesn't give a clear answer so I thought a question on SO would prevent other folks from confusion.

Upvotes: 2

Views: 832

Answers (3)

Nadav Har'El
Nadav Har'El

Reputation: 13771

We can try to explain around the error on Compose's site, but it is an error - Scylla does not require as much RAM as the data size. Of course having as much RAM as possible is good, and any unused RAM will be used for caching data, but it is not a requirement, and not even a recommendation. Other statements on the same page even recommend a 10:1 disk:RAM ratio is recommended, which is of course very different from recommending a 1:1 ratio.

Compose should be notified that they have an error in their documentation.

Upvotes: 3

Peter Corless
Peter Corless

Reputation: 811

As per the Scylla docs, a node may likely have somewhere between 64GB-256GB of memory, but up to 10TB storage.

Let's look at a current AWS instance that we typically run on:

i3.8xl: 244 GiB Memory, 7.6 TB Disk

That's a ratio of roughly 30:1.

A lot depends on your use case, and YMMV, but that's a typical deployed node.

Upvotes: 5

Glauber Costa
Glauber Costa

Reputation: 676

I think what they mean is that Scylla will use all the memory available in the system (unless otherwise specified).

Indeed Scylla is a disk-based system and specializes in dense nodes with a very high disk:memory ratio so you don't need to have as much RAM as your dataset.

Upvotes: 4

Related Questions