Allan Wax
Allan Wax

Reputation: 141

Looking for expanation for High memory usage in Redis

We have a large quantity of data that we store in Redis. Actually, we have a large number of keys that we store in Redis and a tiny amount of data associated with each key. The keys are eight bytes long and the data is 8 bytes long (a long value). There are 1 billion keys (yes, billion).

Given the structure of Redis storage, as far as I can find out (https://redislabs.com/blog/redis-ram-ramifications-part-i/ and https://github.com/antirez/sds/blob/master/README.md) given 8 bytes of key there is overhead of 8 bytes for the header and 1 byte for the null at the end of the key. That is 17 bytes. Assuming this rounds up to at least 24 bytes, adding in the long value of 8 bytes gives 32 bytes.

A billion keys would be 32GB. The measured usage is 158GB. There is, of course, overhead but 5:1 ratio seems large. Can anyone explain this or point to ways to reduce memory usage.

I have included my test program based on Jedis.

import java.security.SecureRandom;
import java.text.DecimalFormat;
import java.util.Date;
import java.util.HashSet;
import java.util.Set;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import org.apache.commons.pool2.impl.GenericObjectPoolConfig;

import redis.clients.jedis.HostAndPort;
import redis.clients.jedis.JedisCluster;
import redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException;

public class Test8byteKeys {
    protected static JedisCluster cluster = null;
    protected static final ExecutorService executor;

    protected static volatile boolean shuttingDown = false;

    private static final int AVAILABLE_PROCESSORS = Runtime.getRuntime().availableProcessors();

    static {
        final int cores = Math.max(4, (AVAILABLE_PROCESSORS * 3) / 4);
        executor = new ThreadPoolExecutor(cores, cores, //
                15, TimeUnit.SECONDS, //
                new LinkedBlockingQueue<>(cores), //
                new ThreadPoolExecutor.CallerRunsPolicy());

        System.out.println("Running with " + cores + " threads");
    }

    static private GenericObjectPoolConfig getPoolConfiguration() {
        GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig();

        poolConfig.setLifo(true);
        poolConfig.setTestOnBorrow(true);
        poolConfig.setTestOnReturn(false);
        poolConfig.setBlockWhenExhausted(true);
        poolConfig.setMinIdle(1);
        poolConfig.setMaxTotal(101);
        poolConfig.setTestWhileIdle(false);
        poolConfig.setSoftMinEvictableIdleTimeMillis(3000L);
        poolConfig.setNumTestsPerEvictionRun(5);
        poolConfig.setTimeBetweenEvictionRunsMillis(5000L);
        poolConfig.setJmxEnabled(true);

        return poolConfig;
    }

    private static void connectToCluster() {
        try {
            Set<HostAndPort> nodes = new HashSet<>();
            String hap /* host and port */ = System.getProperty("hap", null);
            if (hap == null) {
                System.err.println("You must supply the host and port of a master in the cluster on the command line");
                System.err.println("java -Dhap=<host:port> -jar <jar> ");
                System.exit(1);
            }

            String[] parts = hap.split(":"); // assume ipv4 address
            nodes.add(new HostAndPort(parts[0].trim(), Integer.valueOf(parts[1].trim())));

            System.out.println("Connecting to " + hap);
            cluster = new JedisCluster(nodes, getPoolConfiguration());
        }
        catch (Exception e) {
            System.err.println("Could not connect to redis -- " + e.getMessage());
            System.exit(1);
        }
    }

    private static final Thread shutdown = new Thread(new Runnable() {
        // Clean up at exit
        @Override
        public void run() {
            shuttingDown = true;

            System.out.println((new Date()).toString() + "\t" + "Executor shutdown in progress");

            try {
                executor.shutdown();
                executor.awaitTermination(10L, TimeUnit.SECONDS);
            }
            catch (Exception e) {
                // ignore
            }
            finally {
                try {
                    if (!executor.isShutdown()) {
                        executor.shutdownNow();
                    }
                }
                catch (Exception e) {
                    //ignore
                }
            }

            try {
                cluster.close();
            }
            catch (Exception e) {
                System.err.println("cluster disconnection failure: " + e);
            }
            finally {
                //
            }

            System.out.println((new Date()).toString() + "\t" + "shutdown complete.");
        }
    });

    final static char[] CHARACTERS = { //
            '0', '1', '2', '3', '4', '5', //
            '6', '7', '8', '9', 'a', 'b', //
            'c', 'd', 'e', 'f', 'g', 'h', //
            'i', 'j', 'k', 'l', 'm', 'n', //
            'o', 'p', 'q', 'r', 's', 't', //
            'u', 'v', 'w', 'x', 'y', 'z', //
            'A', 'B', 'C', 'D', 'E', 'F', //
            'G', 'H', 'I', 'J', 'K', 'L', //
            'M', 'N', 'O', 'P', 'Q', 'R', //
            'S', 'T', 'U', 'V', 'W', 'X', //
            'Y', 'Z', '#', '@' //
    };

    protected final static byte[] KEY_EXISTS_MARKER = { '1' };

    static class Runner implements Runnable {
        private byte[] key = null;

        public Runner(byte[] key) {
            this.key = key;
        }

        @Override
        public void run() {
            if (!shuttingDown) {
                try {
                    cluster.set(key, KEY_EXISTS_MARKER);
                    cluster.expire(key, 60 * 60 * 48); // for test purposes, only keep around for 2 days
                }
                catch (JedisClusterMaxRedirectionsException e) {
                    System.err.println(
                            (new Date()).toString() + "\tIGNORING\t" + e + "\t" + "For key " + new String(key));
                }
                catch (Exception e) {
                    System.err.println((new Date()).toString() + "\t" + e + "\t" + "For key " + new String(key));
                    e.printStackTrace();
                    System.exit(1);
                }
            }
        }
    }

    public static void main(String[] args) {
        SecureRandom random = new SecureRandom();
        DecimalFormat decimal = new DecimalFormat("#,##0");
        final byte[] randomBytes = new byte[8];

        connectToCluster();

        Runtime.getRuntime().addShutdownHook(shutdown);

        System.out.println((new Date()) + " Starting test");

        for (int i = 0; i < 1000000000; i++) {
            random.nextBytes(randomBytes);
            final byte[] key = new byte[8];
            for (int j = 0; j < randomBytes.length; j++)
                key[j] = (byte) (CHARACTERS[((randomBytes[j] & 0xFF)) % CHARACTERS.length] & 0xFF);

            try {
                if (shuttingDown) {
                    System.err.println((new Date()).toString() + "\t" + "Main loop terminating due to shutdown");
                    break;
                }

                if (i % 1000000 == 0)
                    System.out.println((new Date()).toString() + "\t" + decimal.format(i));

                try {
                    executor.submit(new Runner(key));
                }
                catch (Exception e) {
                    System.err.println((new Date()).toString() + "\t" + e);
                }
            }
            catch (Exception e) {
                System.err.println("Failed to set key " + new String(key) + " -- " + e);
            }
        }

        if (!shuttingDown) {
            System.out.println((new Date()) + " Done");
            System.exit(0);
        }
    }
}

Upvotes: 1

Views: 784

Answers (3)

Andre P
Andre P

Reputation: 21

Virtually every memory manager will have internal overhead for every object you allocate, simply to track the object. eg: when you call free(), the memory manager might need the some info about the object to determine which memory pool/page to which it belongs. Small objects might fall into one pool and use a different allocation mechanism than larger objects.

Very similar to how Redis sds.c/sds.h works, the heap manager usually also adds it's own structure to every malloc()'d object.

If your heap has an overhead of 16 bytes per object, then adding this to each 10KB malloc() would be an imperceptible overhead. However, if you're talking about 8 byte keys in Redis, then adding 16 bytes of overhead for each 8-byte key would exceed the memory of the keys themselves.

You can find a bit more info about malloc chunks and fastbins here: http://iarchsys.com/?p=764

A quick and dirty check of this overhead would be to increase your keys from 8 bytes to 16. Although you're doubling the size of memory used by the keys, you will probably not see a doubling of the memory consumed by the Redis process.

Upvotes: 2

Itamar Haber
Itamar Haber

Reputation: 49932

This requires deeper analysis, but one thing that's obvious is that the overhead calculation is wrong (probably my fault for not completing the blog series - sorry ;)).

Every key in Redis, regardless its type/name/value, has an overhead. The overhead, IIRC, for v3.2.10 was about 70 bytes. However, that overhead was measured on smaller datasets (much less than 1B keys) and if I'm not mistaken a bigger global dictionary will incur more overhead per key. Add to that the value itself and its string overhead, and you get to 80 bytes easily and about 80GB in toto.

That said, I can't explain the x2 factor without actually recreating this in a lab. It could be that the cluster has additional overheads that need to be considered. I recommend that you begin with a smaller data set and compare the standalone vs. cluster memory usage as your next step in investigating this. Also, you may want to test against the latest version of Redis (4) as it includes several memory-usage related optimizations.

Upvotes: 1

darshan kamat
darshan kamat

Reputation: 424

You should consider partitioning Ur redis instance into multiple instances

Upvotes: -1

Related Questions