Dustin Oprea
Dustin Oprea

Reputation: 10266

Best way to store large string in Redis... Getting mixed signals

I'm storing strings on the order of 150M. It's well-below the maximum size of strings in Redis, but I'm seeing a lot of different, conflicted opinions on the approach I should take, and no clear path.

On the one hand, I've seen that I should use a hash with small data chunks, and on the other hand, I've been told that leads to gapping, and that storing the whole string is most efficient.

On the one hand, I've seen that I could pass in the one massive string, or do a bunch of string-append operations to build it up. The latter seems like it might be more efficient than the former.

I'm reading the data from elsewhere, so I'd rather not fill a local, physical file just so that I can pass a whole string. Obviously, it'd be better all around if I can chunk the input data, and feed it into Redis via appends. However, if this isn't efficient with Redis, it might take forever to feed all of the data, one chunk at a time. I'd try it, but I lack the experience, and it might be inefficient for a number of different reasons.

That being said, there's a lot of talk of "small" strings and "large" strings, but it's not clear what Redis considers an optimally "small" string. 512K, 1M, 8M?

Does anyone have any definitive remarks?

I'd love it if I could just provide a file-like object or generator to redis-py, but that's more language-specific than I meant this question to be, and most likely impossible for the protocol, anyway: it'd just require internal chunking of the data, anyway, when it's probably better to just impose this on the developer.

Upvotes: 4

Views: 3418

Answers (1)

Jan Vlcinsky
Jan Vlcinsky

Reputation: 44112

One option would be:

Storing data as long list of chunks

  • store data in List - this allows storing the content as sequence of chunks as well as desctroying whole list in one step
  • store the data using pipeline contenxt manager to ensure, you are the only one, who writes at that moment.
  • be aware, that Redis is always processing single request and all others are blocked for that moment. With large files, which take time to write you can not only slow other clients down, but you are also likely to exceed max execution time (see config for this value).

Store data in randomly named list with known pointer

Alternative approach, also using list, would be to invent random list name, write content chunk by chunk into it, and when you are done, update value in known key in Redis pointing to this randomly named list. Do not forget to remove old one, this can be done from your code, but you might use expiration if it seems usable in your use case.

Upvotes: 2

Related Questions