radai
radai

Reputation: 24192

infinispan file store size disproportional to data size

i wrote a small infinispan cache PoC (code below) to try and asses infinispan performance. while running it i found that for my configuration infinispan apparently does not clear old copies of cache entries from disk, leading to disk space consumption that is orders of magnitude more than expected.

what can i do to bring disk usage down to roughly the size of the actual data?

here's my test code:

import org.infinispan.AdvancedCache;
import org.infinispan.manager.DefaultCacheManager;

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.ObjectOutputStream;
import java.io.Serializable;
import java.util.Arrays;
import java.util.Random;

public class App {
    final static int ELEMENTS_PER_BIN = 1000;
    final static int NUM_OF_BINS = 100;

    public static void main(String[] args) throws Exception {
        File storeFile = new File("store/store.dat");
        if (storeFile.exists() && !storeFile.delete()) {
            throw new IllegalStateException("unable to delete store file from previous run");
        }

        DefaultCacheManager cm = new DefaultCacheManager("infinispan.xml");
        AdvancedCache<String, Bin> cache = cm.<String,Bin>getCache("store").getAdvancedCache();

        Random rng = new Random(System.currentTimeMillis());

        for (int i=0; i<ELEMENTS_PER_BIN; i++) {
            for (int j=0; j<NUM_OF_BINS; j++) {
                String key = "bin-"+j;
                Bin bin = cache.get(key); //get from cache
                if (bin==null) {
                    bin = new Bin();
                }
                bin.add(rng.nextLong()); //modify
                cache.put(key, bin); //write back
            }
        }

        long expectedSize = 0;

        for (int j=0; j<NUM_OF_BINS; j++) {
            String key = "bin-"+j;
            Bin bin = cache.get(key);
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            ObjectOutputStream oos = new ObjectOutputStream(baos);
            oos.writeObject(bin);
            oos.flush();
            oos.close();
            expectedSize += baos.size();
            baos.close();
        }

        long actualSize = new File("store/store.dat").length();

        System.err.println(ELEMENTS_PER_BIN+" elements x "+NUM_OF_BINS+" bins. expected="+expectedSize+" actual="+actualSize+" in "+cache.size()+" elements. diff="+(actualSize/(double)expectedSize));
    }

    public static class Bin implements Serializable{
        private long[] data = null;
        public void add(long datum) {
            data = data==null ? new long[1] : Arrays.copyOf(data, data.length+1); //expand capacity
            data[data.length-1] = datum;
        }
    }
}

and here's the infinispan configuration:

<infinispan
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:infinispan:config:6.0 http://www.infinispan.org/schemas/infinispan-config-6.0.xsd"
        xmlns="urn:infinispan:config:6.0">
    <namedCache name="store">
        <eviction strategy="LRU" maxEntries="20"/>
        <persistence passivation="false">
            <singleFile location="store">
                <async enabled="false"/>
            </singleFile>
        </persistence>
    </namedCache>
</infinispan>

infinispan is (supposed to be?) configured as a write-through cache with 20 latest elements in RAM and a live copy of everything on disk.

running the above code gives this:

1000 elements x 100 bins. expected=807300 actual=411664404 in 100 elements. diff=509.92741731698254

which means that for 788 KBytes of data i end up with a ~392 MB file!

what am i doing wrong?

the version of infinispan in question is 6.0.2.Final

Upvotes: 0

Views: 1067

Answers (1)

Radim Vansa
Radim Vansa

Reputation: 5888

When you store only longer and longer records, the space used previously is not reused. There's no defragmentation policy in SingleFileStore, free space is kept as a map of list of entry spaces, but adjacent free spaces are not merged. Therefore, the new entry is always added on the end of the file and the beginning is fragmented and unused.

By the way, for finding out the expected size you should also:

  • use JBoss Marshalling instead of Java Serialization
  • serialize the key as well
  • serialize Infinispan metadata (such as entry lifespan, last use time, possibly version etc...)

Upvotes: 2

Related Questions