Adel Boutros
Adel Boutros

Reputation: 10285

Fastest way to get all values from a Map where the key starts with a certain expression

Consider you have a map<String, Object> myMap.

Given the expression "some.string.*", I have to retrieve all the values from myMap whose keys starts with this expression.

I am trying to avoid for loops because myMap will be given a set of expressions not only one and using for loop for each expression becomes cumbersome performance wise.

What is the fastest way to do this?

Upvotes: 28

Views: 46620

Answers (6)

Martin Andersson
Martin Andersson

Reputation: 19573

The accepted answer works in 99% of all the cases, but the devil is in the details.

Specifically, the accepted answer does not work when the map has a key which begins with the prefix, followed by Character.MAX_VALUE followed by anything else. Comments posted to the accepted answer yields small improvements, but still does not cover all of the cases.

The following solution also uses NavigableMap to pick out a sub map given a key prefix. The solution is the subMapFrom() method and the trick is to not bump/increment the last char of the prefix, rather, the last char which is not MAX_VALUE whilst cutting off all trailing MAX_VALUEs. So for example, if the prefix is "abc" we increment it to "abd". But if the prefix is "ab" + MAX_VALUE we drop the last char and bump the preceding char instead, resulting in "ac".

import static java.lang.Character.MAX_VALUE;

public class App
{
    public static void main(String[] args) {
        NavigableMap<String, String> map = new TreeMap<>();
        
        String[] keys = {
                "a",
                "b",
                "b" + MAX_VALUE,
                "b" + MAX_VALUE + "any",
                "c"
        };
        
        // Populate map
        Stream.of(keys).forEach(k -> map.put(k, ""));
        
        // For each key that starts with 'b', find the sub map
        Stream.of(keys).filter(s -> s.startsWith("b")).forEach(p -> {
            System.out.println("Looking for sub map using prefix \"" + p + "\".");
            
            // Always returns expected sub maps with no misses
            // [b, b￿, b￿any], [b￿, b￿any] and [b￿any]
            System.out.println("My solution: " +
                    subMapFrom(map, p).keySet());
            
            // WRONG! Prefix "b" misses "b￿any"
            System.out.println("SO answer:   " +
                    map.subMap(p, true, p + MAX_VALUE, true).keySet());
            
            // WRONG! Prefix "b￿" misses "b￿" and "b￿any"
            System.out.println("SO comment:  " +
                    map.subMap(p, true, tryIncrementLastChar(p), false).keySet());
            
            System.out.println();
        });
    }
    
    private static <V> NavigableMap<String, V> subMapFrom(
            NavigableMap<String, V> map, String keyPrefix)
    {
        final String fromKey = keyPrefix, toKey; // undefined
        
        // Alias
        String p = keyPrefix;
        
        if (p.isEmpty()) {
            // No need for a sub map
            return map;
        }
        
        // ("ab" + MAX_VALUE + MAX_VALUE + ...) returns index 1
        final int i = lastIndexOfNonMaxChar(p);
        
        if (i == -1) {
            // Prefix is all MAX_VALUE through and through, so grab rest of map
            return map.tailMap(p, true);
        }
        
        if (i < p.length() - 1) {
            // Target char for bumping is not last char; cut out the residue
            // ("ab" + MAX_VALUE + MAX_VALUE + ...) becomes "ab"
            p = p.substring(0, i + 1);
        }
        toKey = bumpChar(p, i);
        
        return map.subMap(fromKey, true, toKey, false);
    }
    
    private static int lastIndexOfNonMaxChar(String str) {
        int i = str.length();
        
        // Walk backwards, while we have a valid index
        while (--i >= 0) {
            if (str.charAt(i) < MAX_VALUE) {
                return i;
            }
        }
        
        return -1;
    }
    
    private static String bumpChar(String str, int pos) {
        assert !str.isEmpty();
        assert pos >= 0 && pos < str.length();
        
        final char c = str.charAt(pos);
        assert c < MAX_VALUE;
        
        StringBuilder b = new StringBuilder(str);
        b.setCharAt(pos, (char) (c + 1));
        return b.toString();
    }
    
    private static String tryIncrementLastChar(String p) {
        char l = p.charAt(p.length() - 1);
        return l == MAX_VALUE ?
                // Last character already max, do nothing
                p :
                // Bump last character
                p.substring(0, p.length() - 1) + ++l;
    }
}

Output:

Looking for sub map using prefix "b".
My solution: [b, b￿, b￿any]
SO answer:   [b, b￿]
SO comment:  [b, b￿, b￿any]

Looking for sub map using prefix "b￿".
My solution: [b￿, b￿any]
SO answer:   [b￿, b￿any]
SO comment:  []

Looking for sub map using prefix "b￿any".
My solution: [b￿any]
SO answer:   [b￿any]
SO comment:  [b￿any]

Should perhaps be added that I also tried various other approaches including code I found elsewhere on the internet. All of them failed by yielding an incorrect result or out right crashed with various exceptions.

Upvotes: 4

Justinas Jakavonis
Justinas Jakavonis

Reputation: 8798

Remove all keys which does not start with your desired prefix:

yourMap.keySet().removeIf(key -> !key.startsWith(keyPrefix));

Upvotes: 2

stemm
stemm

Reputation: 6040

If you work with NavigableMap (e.g. TreeMap), you can use benefits of underlying tree data structure, and do something like this (with O(lg(N)) complexity):

public SortedMap<String, Object> getByPrefix( 
        NavigableMap<String, Object> myMap, 
        String prefix ) {
    return myMap.subMap( prefix, prefix + Character.MAX_VALUE );
}

More expanded example:

import java.util.NavigableMap;
import java.util.SortedMap;
import java.util.TreeMap;

public class Test {

    public static void main( String[] args ) {
        TreeMap<String, Object> myMap = new TreeMap<String, Object>();
        myMap.put( "111-hello", null );
        myMap.put( "111-world", null );
        myMap.put( "111-test", null );
        myMap.put( "111-java", null );

        myMap.put( "123-one", null );
        myMap.put( "123-two", null );
        myMap.put( "123--three", null );
        myMap.put( "123--four", null );

        myMap.put( "125-hello", null );
        myMap.put( "125--world", null );

        System.out.println( "111 \t" + getByPrefix( myMap, "111" ) );
        System.out.println( "123 \t" + getByPrefix( myMap, "123" ) );
        System.out.println( "123-- \t" + getByPrefix( myMap, "123--" ) );
        System.out.println( "12 \t" + getByPrefix( myMap, "12" ) );
    }

    private static SortedMap<String, Object> getByPrefix(
            NavigableMap<String, Object> myMap,
            String prefix ) {
        return myMap.subMap( prefix, prefix + Character.MAX_VALUE );
    }
}

Output is:

111     {111-hello=null, 111-java=null, 111-test=null, 111-world=null}
123     {123--four=null, 123--three=null, 123-one=null, 123-two=null}
123--   {123--four=null, 123--three=null}
12      {123--four=null, 123--three=null, 123-one=null, 123-two=null, 125--world=null, 125-hello=null}

Upvotes: 44

OldCurmudgeon
OldCurmudgeon

Reputation: 65811

I wrote a MapFilter recently for just such a need. You can also filter filtered maps which makes then really useful.

If your expressions have common roots like "some.byte" and "some.string" then filtering by the common root first ("some." in this case) will save you a great deal of time. See main for some trivial examples.

Note that making changes to the filtered map changes the underlying map.

public class MapFilter<T> implements Map<String, T> {

    // The enclosed map -- could also be a MapFilter.
    final private Map<String, T> map;

    // Use a TreeMap for predictable iteration order.
    // Store Map.Entry to reflect changes down into the underlying map.
    // The Key is the shortened string. The entry.key is the full string.
    final private Map<String, Map.Entry<String, T>> entries = new TreeMap<>();
    // The prefix they are looking for in this map.
    final private String prefix;

    public MapFilter(Map<String, T> map, String prefix) {
        // Store my backing map.
        this.map = map;
        // Record my prefix.
        this.prefix = prefix;
        // Build my entries.
        rebuildEntries();
    }

    public MapFilter(Map<String, T> map) {
        this(map, "");
    }

    private synchronized void rebuildEntries() {
        // Start empty.
        entries.clear();
        // Build my entry set.
        for (Map.Entry<String, T> e : map.entrySet()) {
            String key = e.getKey();
            // Retain each one that starts with the specified prefix.
            if (key.startsWith(prefix)) {
                // Key it on the remainder.
                String k = key.substring(prefix.length());
                // Entries k always contains the LAST occurrence if there are multiples.
                entries.put(k, e);
            }
        }

    }

    @Override
    public String toString() {
        return "MapFilter(" + prefix + ") of " + map + " containing " + entrySet();
    }

    // Constructor from a properties file.
    public MapFilter(Properties p, String prefix) {
        // Properties extends HashTable<Object,Object> so it implements Map.
        // I need Map<String,T> so I wrap it in a HashMap for simplicity.
        // Java-8 breaks if we use diamond inference.
        this(new HashMap<>((Map) p), prefix);
    }

    // Helper to fast filter the map.
    public MapFilter<T> filter(String prefix) {
        // Wrap me in a new filter.
        return new MapFilter<>(this, prefix);
    }

    // Count my entries.
    @Override
    public int size() {
        return entries.size();
    }

    // Are we empty.
    @Override
    public boolean isEmpty() {
        return entries.isEmpty();
    }

    // Is this key in me?
    @Override
    public boolean containsKey(Object key) {
        return entries.containsKey(key);
    }

    // Is this value in me.
    @Override
    public boolean containsValue(Object value) {
        // Walk the values.
        for (Map.Entry<String, T> e : entries.values()) {
            if (value.equals(e.getValue())) {
                // Its there!
                return true;
            }
        }
        return false;
    }

    // Get the referenced value - if present.
    @Override
    public T get(Object key) {
        return get(key, null);
    }

    // Get the referenced value - if present.
    public T get(Object key, T dflt) {
        Map.Entry<String, T> e = entries.get((String) key);
        return e != null ? e.getValue() : dflt;
    }

    // Add to the underlying map.
    @Override
    public T put(String key, T value) {
        T old = null;
        // Do I have an entry for it already?
        Map.Entry<String, T> entry = entries.get(key);
        // Was it already there?
        if (entry != null) {
            // Yes. Just update it.
            old = entry.setValue(value);
        } else {
            // Add it to the map.
            map.put(prefix + key, value);
            // Rebuild.
            rebuildEntries();
        }
        return old;
    }

    // Get rid of that one.
    @Override
    public T remove(Object key) {
        // Do I have an entry for it?
        Map.Entry<String, T> entry = entries.get((String) key);
        if (entry != null) {
            entries.remove(key);
            // Change the underlying map.
            return map.remove(prefix + key);
        }
        return null;
    }

    // Add all of them.
    @Override
    public void putAll(Map<? extends String, ? extends T> m) {
        for (Map.Entry<? extends String, ? extends T> e : m.entrySet()) {
            put(e.getKey(), e.getValue());
        }
    }

    // Clear everything out.
    @Override
    public void clear() {
        // Just remove mine.
        // This does not clear the underlying map - perhaps it should remove the filtered entries.
        for (String key : entries.keySet()) {
            map.remove(prefix + key);
        }
        entries.clear();
    }

    @Override
    public Set<String> keySet() {
        return entries.keySet();
    }

    @Override
    public Collection<T> values() {
        // Roll them all out into a new ArrayList.
        List<T> values = new ArrayList<>();
        for (Map.Entry<String, T> v : entries.values()) {
            values.add(v.getValue());
        }
        return values;
    }

    @Override
    public Set<Map.Entry<String, T>> entrySet() {
        // Roll them all out into a new TreeSet.
        Set<Map.Entry<String, T>> entrySet = new TreeSet<>();
        for (Map.Entry<String, Map.Entry<String, T>> v : entries.entrySet()) {
            entrySet.add(new Entry<>(v));
        }
        return entrySet;
    }

    /**
     * An entry.
     *
     * @param <T> The type of the value.
     */
    private static class Entry<T> implements Map.Entry<String, T>, Comparable<Entry<T>> {

        // Note that entry in the entry is an entry in the underlying map.

        private final Map.Entry<String, Map.Entry<String, T>> entry;

        Entry(Map.Entry<String, Map.Entry<String, T>> entry) {
            this.entry = entry;
        }

        @Override
        public String getKey() {
            return entry.getKey();
        }

        @Override
        public T getValue() {
            // Remember that the value is the entry in the underlying map.
            return entry.getValue().getValue();
        }

        @Override
        public T setValue(T newValue) {
            // Remember that the value is the entry in the underlying map.
            return entry.getValue().setValue(newValue);
        }

        @Override
        public boolean equals(Object o) {
            if (!(o instanceof Entry)) {
                return false;
            }
            Entry e = (Entry) o;
            return getKey().equals(e.getKey()) && getValue().equals(e.getValue());
        }

        @Override
        public int hashCode() {
            return getKey().hashCode() ^ getValue().hashCode();
        }

        @Override
        public String toString() {
            return getKey() + "=" + getValue();
        }

        @Override
        public int compareTo(Entry<T> o) {
            return getKey().compareTo(o.getKey());
        }

    }

    // Simple tests.
    public static void main(String[] args) {
        String[] samples = {
                "Some.For.Me",
                "Some.For.You",
                "Some.More",
                "Yet.More"};
        Map map = new HashMap();
        for (String s : samples) {
            map.put(s, s);
        }
        Map all = new MapFilter(map);
        Map some = new MapFilter(map, "Some.");
        Map someFor = new MapFilter(some, "For.");
        System.out.println("All: " + all);
        System.out.println("Some: " + some);
        System.out.println("Some.For: " + someFor);

        Properties props = new Properties();
        props.setProperty("namespace.prop1", "value1");
        props.setProperty("namespace.prop2", "value2");
        props.setProperty("namespace.iDontKnowThisNameAtCompileTime", "anothervalue");
        props.setProperty("someStuff.morestuff", "stuff");
        Map<String, String> filtered = new MapFilter(props, "namespace.");
        System.out.println("namespace props " + filtered);
    }

}

Upvotes: 5

Adriaan Koster
Adriaan Koster

Reputation: 16209

I used this code to do a speed trial:

public class KeyFinder {

    private static Random random = new Random();

    private interface Receiver {
        void receive(String value);
    }

    public static void main(String[] args) {
        for (int trials = 0; trials < 10; trials++) {
            doTrial();
        }
    }

    private static void doTrial() {

        final Map<String, String> map = new HashMap<String, String>();
        giveRandomElements(new Receiver() {
            public void receive(String value) {
                map.put(value, null);
            }
        }, 10000);

        final Set<String> expressions = new HashSet<String>();
        giveRandomElements(new Receiver() {
            public void receive(String value) {
                expressions.add(value);
            }
        }, 1000);

        int hits = 0;
        long start = System.currentTimeMillis();
        for (String expression : expressions) {
            for (String key : map.keySet()) {
                if (key.startsWith(expression)) {
                    hits++;
                }
            }
        }
        long stop = System.currentTimeMillis();
        System.out.printf("Found %s hits in %s ms\n", hits, stop - start);
    }

    private static void giveRandomElements(Receiver receiver, int count) {
        for (int i = 0; i < count; i++) {
            String value = String.valueOf(random.nextLong());
            receiver.receive(value);
        }

    }
}

The output was:

Found 0 hits in 1649 ms
Found 0 hits in 1626 ms
Found 0 hits in 1389 ms
Found 0 hits in 1396 ms
Found 0 hits in 1417 ms
Found 0 hits in 1388 ms
Found 0 hits in 1377 ms
Found 0 hits in 1395 ms
Found 0 hits in 1399 ms
Found 0 hits in 1357 ms

This counts how many of 10000 random keys start with any one of 1000 random String values (10M checks).

So about 1.4 seconds on a simple dual core laptop; is that too slow for you?

Upvotes: 1

shift66
shift66

Reputation: 11958

map's keyset has no a special structure so I think you have to check each of the keys anyway. So you can't find a way which will be faster than a single loop...

Upvotes: 1

Related Questions