Reputation: 8376
I found Redis have so good features for my project (webapp's autocomplete back-end). Basicly, it is my fulltext search engine. Now i am looking a replacement for Redis, because i can't hold whole dataset in memory.
I create my Redis store like this (can't find the link for credits for this idea):
['wor', 'ord', 'rds']
)ZADD chunk weight items_id
)SET items_id items_hash_in_json
)Search works like this:
ZINTERSTORE
and ZRANGEBYSCORE
)Plain and simple. Very effective and fast. There are some smaller cons still in such flow, but mostly i feel having just right tools and right datatypes for my domain.
Main problem is: it requires too much memory. I have about 600K items in database and on 'indexing' i cut them down after 40 chars, but it takes still 2.5GB RAM. It is a bit much for the task. And dataset will grow, not too much and too fast, but still.
I have looked some NoSQL stores now and i have not met similar approach and tools as Redis has. Maybe it is because i look hammer for every work now, but i feel that with other NoSQL stores i need to implement myself such functionalities ( sorted lists, find intersection of them, simple key-value as binary strings, inserting data dead simple, simple protocol/API and simple clients ).
I'd like to have Perl binding too, but in case of very simple protocol (like REST for CoachDB) it is not mandatory.
Do you know such tools to implement my solution with other NoSQL product?
With other eye i already look for completly different solutions too (like couchdb-lucene, but i'd like to avoid abandon system i described above.
Upvotes: 2
Views: 275
Reputation: 4638
HTTP Cache I have a possible solution for you that I currently use on my site. I cache autocomplete queries with static files using Nginx. Nginx can serve static files very quickly. Here is a sample lines I have in my config.
http {
fastcgi_cache_path /var/cache/nginx levels=1:2
keys_zone=tt:600m
inactive=7d max_size=10g;
fastcgi_temp_path /var/cache/nginx/tmp;
}
This block describes that path where the files will be stored. levels is how many directories deep, 1:2 would suffice. My zone here is called tt, name it whatever you want. Followed by expiration time.
location ~ /tt/(.+)\.php$ {
try_files $uri /index.php?$args;
fastcgi_index index.php;
fastcgi_pass 127.0.0.1:9000;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_param SCRIPT_NAME $fastcgi_script_name;
#Caching parameters
fastcgi_cache tt;
fastcgi_cache_key "$scheme$request_method$host$request_uri";
fastcgi_cache_valid 200 302 304 30m;
fastcgi_cache_valid 301 1h;
fastcgi_cache_valid any 5m;
fastcgi_cache_use_stale error timeout invalid_header updating http_500;
}
The location block will that contains the cache params. So anything with the URI /tt/.*.php will be cached. The URI + query string will become the cache key.
If you don't use Nginx, the same concept might work with another webserver. I hope this helps.
Edit
From Comments: Using index as plain files seems rather slower than SQL queries. Still, i have not benchmarked them.
A cache hit for Nginx will look something like this:
-> Nginx -> file
Miss:
-> Nginx -> php/python/ruby -> db(redis/mysql/whatever)
The first path might seem slower because you think of diskio, but it's not, the OS will automatically cache files that are frequently accessed. So when Nginx heats up, just hitting your PHP backend to say "Hello world" is going to be slower in comparison. I make that claim because it's just like serving a static file.
Actual hit/miss rates will depend on the application, data, and configuration. In my experience, people use a lot of the same search terms, so you probably won't have 600k files sitting around. Even if you do it doesn't really hurt, Nginx manages them for you. This method isn't very good if your data changes a lot and you want the search to reflect those changes quickly. You would have to set a short expire time which would result in more misses.
Redis Zip Lists/Hashes http://redis.io/topics/memory-optimization
If you still need sorted sets, make sure the configuration settings from the link are set high enough for your dataset needs. If you are able to use hashes, you can save a ton of memory using the algorithm they show lower on that page. I think you can definitely use it to when storing the item_id linking to a json string.
Upvotes: 1
Reputation: 2150
Just asimple idea that could be useful for you. It's not a direct and exact answer to your question.
I suppose that most of your data or a significant part is located in those JSON documents. In this situation I suggest you to change slightly your data infrastructure: in order to keep all the benefits of Redis, you should use the same first 2 steps for create and search, but change your 3rd step. Instead of using Redis to store these JSON documents, just move them to a simple indexed table of your prefered/used DB. This way you'll handle just the chunks and keys and perform the operations offered by Redis, but at step 3 you take the list of item_id's and retrieve the JSON data from your DB. Probably a SELECT ... WHERE item_id IN(...)
will be enough.
Upvotes: 0