Reputation: 4334
I have a very large database of Objects (read an array of key/value pairs, like [{}, {}, {}]
in standard C notation), and I need to be able to search for any value of any key within that set of pairs and find the object which contains it (I'll be using fuzzy searching or similar string comparison algorithms). One approach I can think of would be to create an enormous master object with a key referencing to the original object for each value inside the object:
DB = [
{
"a": 45,
"b": "Hello World"
},
{
"a": 32,
"b": "Testing..."
}
]
// ... Generation Code ... //
search = {
45: {the 0th object},
"Hello World": {the 0th object},
32: {the 1st object},
"Testing...": {the 1st object}
}
This solution at least reduces the problem to a large number of comparisons, but are there better approaches? Please note that I have very little formal Computer Science training so I may be missing some major detail simplifying or proving impossible this problem.
P.S. Is this too broad? If so, I'll gladly delete it
Upvotes: 0
Views: 209
Reputation: 10066
Your combined index is more suitable for a full-text search, but doesn't indicate in which property of an object the value is found. An alternative technique that provides more context is to build an index per property.
This should be faster both in preparation and during lookup on property-specific searchers (e.g. a == 32
) since for n objects and p properties, a binary search (used in both inserts and lookups) would require log(np) comparisons on a combined index and log(n) on a single-property index.
In either case, you need to watch out for multiple occurrences of the same value. You can store an array of offsets as the value of each index entry, rather than just a single value.
For example:
search = {
"a": {
45: [0],
32: [1]
},
"b": {
"Hello World": [0],
"Testing...": [1]
}
}
Upvotes: 1