lukik
lukik

Reputation: 4060

Faster way to dump data from redis hash to postgresql table

Am doing bulk data processing and for speed's sake am first storing the data in redis before dumping it into a postgresql database in 2 minute intervals. I've used a redis hash to store the data and even the hash keys in redis correspond to the columns in the database tables.

Am using redis.scan() to get the list of redis hashes storing the rows of data and then using redis.hgetall() to get the data in the hashes. From there I create an SQL Insert statement in SqlAlchemy and perform a bulk data insert into the database.

The issues I face is that I have to first extract the keys containing my data using SCAN:

redis_match = 'data:row_keys:*'
row_keys = rdb.scan_iter(match=redis_match, count=limit_no)

From there I fetch all data in each hash:

for index, row_id in enumerate(row_keys):
    row_data = rdb.hgetall(row_id)

The row_data is in the form of key:value but its stored in byte form so am incurring an extra overhead to decode each key and value manually using:

for key, value in row_data.items():
  key = ast.literal_eval(key.decode('unicode_escape'))
  value = ast.literal_eval(value.decode('unicode_escape'))

I feel this is too much and there must be a more elegant way to:

  1. Get the data from redis using hgetall() and be able to use that data immediately for a bulk SQL insert since the keys in the redis hash correspond to the column names in the postgresql table
  2. Even if 1 is not possible, at least there must be a quicker way to get the data from redis using hgetall() and do some on the fly decoding of the whole entry i.e. all entries in the hash instead of iterating to each key and value

Edit:

I've read about postgresql's Foreign Data Wrappers and especially redis_fdw and am wondering whether its the one to resolve my situation of getting a faster way to move data from Redis to Postgresql with the least trouble possible

Upvotes: 1

Views: 2311

Answers (1)

thalisk
thalisk

Reputation: 7743

The redis_fdw is the way to go. Just keep in mind that each member of your hash set will not be a different row in the corresponding Pg foreign table. It will instead create in the foreign table a single row per Redis hash and use a Pg array for all of the hash's values.

E.g. for the following hash in Redis:

myhash = {a:1, b:2}

you could create the foreign table:

CREATE FOREIGN TABLE my_pg_hash (key text, val text[])
SERVER redis_server
OPTIONS (database '0', tabletype 'hash', tablekeyprefix 'myhash');

Foreign table my_pg_hash will contain a single row for the entire Redis hash set myhash. This row will have as key myhash and as value a postgres array with all of the key/value pairs of your redis hash.

SELECT * FROM my_pg_hash;

 key      |    val    
----------+-----------
 myhash   | {a,1,b,2}
(1 row)

You can split the val array into separate rows using Pg's unnest() function:

SELECT key, unnest(val) FROM my_pg_hash;

  key   | unnest 
--------+--------
 myhash | a
 myhash | 1
 myhash | b
 myhash | 2
(4 rows)

Upvotes: 1

Related Questions