Reputation: 1167
Suppose I have a large list (around 10,000 entries) of string triples as such:
car noun yes
dog noun no
effect noun yes
effect verb no
Suppose I am presented with a string double - for example, (effect, verb) - and I need to quickly look in the list to see if the pair appears and, if it does, whether its value is yes or no. (For this example the double does appear and the value is "no".)
What is the best data structure in Java to store the list and the most efficient way to perform the search? I am running hundreds of thousands of these searches so speed is of the essence.
Thanks!
Upvotes: 1
Views: 2160
Reputation: 12091
You might consider using a HashMap<YourDouble, String>
. Searches will be O(1).
You could either create an object, YourDouble which holds the first two values, or else append one to the other -- if values will still be unique -- and use HashMap<String, String>
.
Upvotes: 5
Reputation: 27252
10k doesn't seem that large to me. Have you tried a DB?
The place to look for information like this is the Semantic Web. A number of projects work on Triple Stores of just this type. There's a list at the bottom of the Triple Store page of implementations.
As far as java is concerned your algorithms are almost certainly going to be language dependent and if you find a good algorithm implemented in C its java port will also be fast.
Also, what's your data set look like? Are there a lot of 2 matches such that subject and verb are often the same? How many matches are you expecting to get? MapReduce will work work well for finding one match in 10k but won't work as well doing a query that returns a 8k of 10k where the query can't be easily partitioned.
There's a query language made just for this problem too: SPARQL. The bigdata blog has some good insights, though again 10k doesn't seem that large.
Upvotes: 1
Reputation: 18633
You could use a HashMap
where the key is the concatenation of the first two strings, the ones which you'll use for lookups, and the value is a Boolean, representing the yes
and no
strings.
Alternatively, it seems the words in the second column would be fewer, since they represent categories. You could have a HashMap<String, HashMap<String, Boolean>>
where you first index by e.g. "noun", "verb" etc. and then you index by e.g. "car", "dog", "effect", to get to your boolean. This would probably be more space-efficient.
Upvotes: 1
Reputation: 1503489
I would create a HashMultimap
for each type of search you want, e.g. "all three", "each pair", and "each single field". When you build the list, populate all the different maps, then you can fetch from whichever map is appropriate for your query.
(The downside is that you'll need a type for at least each arity, e.g. use just String for the "single field" maps, but a Pair
for the two-field maps, and a Triple
for the three-field map.)
Upvotes: 1