Reputation: 1
I'm running into a problem with a project I'm working on in my spare time. I'm using Google App Engine (Java version), but this question is not specific to that platform, and I would consider other languages/platforms if they could solve the problem.
The following illustrates the problem:
Suppose I have a datastore with thousands of recipes, and the ingredients for each recipe. (For the sake of this illustration, forget about measurements.) I want to be able to enter a list of ingredients that I have on hand, and then quickly retrieve all recipes for which I have at least XX% of the ingredients (let's say 75%). I'm willing to sacrifice some accuracy and some results for speed, but do want a certain degree of accuracy. I can do a more thorough comparison after I get the "quick results."
My attempt at a solution: Analyzing the database of recipes, I compile a list of, say, 200 common food ingredients (eggs, flour, salt, sugar, rosemary, etc). Almost all the ingredients for the recipes are contained within this master list:
Common Food Ingredients: [ eggs , flour , salt , sugar , cinnamon ... ]
Then, I go through each individual recipe and compare the ingredients to this master list, and end up with a set of 200 booleans for each recipe:
Recipe #106: [ T , T , F , T , F ... ]
Recipe #107: [ F , T , T , T , F ... ]
I would store this information with the recipes. (Up to this point, it's all data prep work, which I have all the time in the world to do.)
Now, I enter my list of ingredients on hand. I would do the same comparison with the master list:
My ingredients on hand: [ F , F , T , T , F ... ]
And this is where I'm stuck. How can I quickly compare this set of booleans against the sets for the recipes so I can identify recipes for which I have at least 75% of the ingredients?
Or (and this would be the holy grail), during the data preparation, instead of storing the set of booleans themselves with each recipe, is there a calculation I can perform that will give me a single value I can later filter off of? (E.g., "SELECT * FROM recipes WHERE master_list_boolean_metric <= 29")
Or am I going about this the wrong way? (Any guidance, general or specific, would be appreciated.) What I want to avoid is doing a slow comparison, ingredient by ingredient, between each recipe and my list of "on-hand" ingredients.
Or... perhaps it isn't possible to do this quickly?
Upvotes: 0
Views: 75