Reputation: 113
I have a requirement to create a java cache which holds all the cities and airports. So, if i query the cache for a location, lets say a city, it should return all the airports in that city and if I query a location which is an airport, i should get back that airport. Also, each location has to be stored as a byte array in cache.(as the exposed interface for querying the cache has byte[] as the parameter for location) Other considerations are:
What I have got so far:
Approach 1
Create a thin wrapper over byte[] array, lets say ByteWrapper. Put each location(both airports and cities) as a key in map(TreeMap?). Use lists of ByteWrapper(containing airports where ever applicable) as values.
Approach 2
Create multi dimensional byte[] array which is sorted on location. Its essentially a map. Then use binary search to locate the key and return results.
What approach would you suggest? Please let me know in case you have better ideas Thanks
Upvotes: 1
Views: 1261
Reputation: 113
SO this is what I have done so far:
private static byte[][][] cache = null; // this is the actual cache
// this map has ByteArrayWrapper(a wrapper over byte[]) as key which
// can be an airport or city and index of corresponding
// airport/airports in byte[][][]cache as value
Map<ByteArrayWrapper, Integer> byteLocationIndexes = null;
/**
* This is how cache is queried. You can pass an airport or city as a location parameter
* It will fetch the corresponding airport/airports
*/
private byte[][] getAllAirportsForLocation(ByteArrayWrapper location) {
byte[][] airports = null;
airports = byteLocationIndexes.get(location)== null ? null : cache[byteLocationIndexes.get(location).intValue()];
return airports;
}
I bench marked performance using both String as key in indexMap (and using String[][] cache)and ByteArrayWrapper as key (and byte[] as cache). There is a 15-20% improvement if I use ByteArrayWrapper and byte[][][] cache.
What else can be done to improve the performance? Would it help if I use some other implementation of Map? As the cache is loaded only once and never changes, it can be sorted. Most time is taken in key look up in byteLocationIndexes and that is the bottle neck. I am already calculating hashCode at the time of object creation and storing it as a local variable in ByteArrayWrapper.
Any suggestions?
Upvotes: 0
Reputation: 12770
Give a try to approach 1, as byte[] is an Object type you could use something like:
Map<byte[], List<byte[]>> cache = ...
It's probably the simplest approach, you just will have to choose the implementation for your Map. Probably you should go with a HashMap because it's the simplest...
As gustavc said using a HashMap won't work, so you could instead use a TreeMap with a given comparator:
Map<byte[], List<byte[]>> m = new TreeMap<byte[], List<byte[]>>(new Comparator<byte[]>() {
public int compare(byte[] o1, byte[] o2) {
int result = (o1.length < o2.length ? -1 : (o1.length == o2.length ? 0 : 1));
int index = 0;
while (result == 0 && index < o1.length) {
result = (o1[index] < o2[index] ? -1 : (o1[index] == o2[index] ? 0 : 1));
index++;
}
return result;
}
});
Upvotes: 0
Reputation: 27234
The fact that the exposed API is byte[] based shouldn't necessarily dictate the internal details of your cache.
Second observation is that this is not a generalized data structure problem. Both the space of all airports and space of all cities are finite, and well known. (You even know the size).
Hash maps, trees, etc. are all algorithms that guarantee certain performance characteristics within established bounds for general usage.
Since data integrity is a non-issue ("data doesn't change") and if space considerations are not critical ("as fast as possible"), then why not:
[Edit: this bit somehow cut lost in the cut and paste: You index (number) your cities and airports, given that you know these sets and they are effectively static.]
// these need to get initialized on startup
// this initialization can be optimized.
Map<byte[], Long> airportIndexes = new HashMap<byte[], Long>(NUMBER_OF_AIRPORTS);
Map<byte[], Long> citiesIndexes = new HashMap<byte[], Long>(NUMBER_OF_CITIES);
Map<Long, byte[]> airports = new HashMap<Long, byte[]>(NUMBER_OF_AIRPORTS);
Map<Long, byte[]> cities = new HashMap<Long, byte[]>(NUMBER_OF_CITIES);
long[][] airportToCitiesMappings = new byte[NUMBER_OF_AIRPORTS][];
long[][] citiesToAirportMappings = new byte[NUMBER_OF_CITIES][];
public List<byte[]> getCitiesNearAirport(byte[] airportName) {
Long[] cityIndexes = getCitiesByIdxNearAirport(airportName);
List<byte[]> cities = new ArrayList<byte[]>(cityIndexes.length);
for(Long cityIdx : cityIndexes) {
cities.add(cities.get(cityIdx));
}
return cities;
}
public long[] getCitiesByIdxNearAirport(Long airportIdx) {
return airportToCitiesMappings[airportIdx];
}
public long[] getCitiesNearAirport(byte[] airportName) {
return getCitiesNearAirport(airportIndexes.get(airportName));
}
public long[] getCitiesNearAirport(Long airportIdx) {
return airportToCitiesMappings[airportIdx];
}
// .. repeat above pattern for airports.
That should give you O(1) time performance characteristics. There is considerable redundancy in terms of space.
Upvotes: 1
Reputation: 5145
You do not need Byte Arrays, Strings would be just fine.
How often you would add items to this cache? I am guessing it is completely static, since they are not making new cities or airports every day.
So, what you can do is using two MultiHashMaps, one keying on the city and another keying on airports. Checkout Google Multimap http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Multimap.html
If you are using mySQL by any chance, you can actually use a table based on Memory Storage Engine.
Many Databases can pin a table in memory, definitely Oracle can, so that is another way to go.
Upvotes: 0