Reputation: 2027
First I will paste the scenario and then pose my question:
Suppose you have a list of Categories, for example:
Food,Meat,Dairy,Fruit,Vegetable,Grain,Wheat,Barley
Now you have a list of items that fits into one or more of the categories listed above.
Here is a sample list of items:
Pudding,Cheese,Milk,Chicken,Barley,Bread,Couscous,Fish,Apple,Tomato,
Banana,Grape,Lamb,Roast,Honey,Potato,Rice,Beans,Legume,Barley Soup
As you see every item fits into at least one category, it could fit into more, or possibly all but the minimum is always one.
For example Cheese
is a Food
and Dairy
.
Each item has two attributes:
1) A Price Tag
2) A Random Value
A set is defined as having every category mapped to an item.
In other words all categories must be present in a set.
A set from the items above could be:
[Pudding,Lamb,Milk,Apple,Tomato,Legume,Bread,Barley Soup]
As you see each item is mapped to a category slot:
My question is, what is the most efficient algorithm for generating in-order sets of the above categories from a list of items given.
The best set is defined as having the highest Random Value in total.
The only constraint is that any generated set cannot, in total, exceed a certain fixed amount, in other words, all generated sets should be within this Price Cap.
Hope I am clear, thank you!
Upvotes: 8
Views: 360
Reputation: 25557
I'm not exactly sure what you mean by "generating in-order sets".
I think any algorithm is going to generate sets, score them, and then try to generate better sets. Given all the constraints, I do not think you can generate the best set efficiently in one pass.
The 0-1 knapsack problem has been shown to be NP-hard, which means there is no known polynomial time (i.e. O(n^k)) solution. That problem is the same as you would have if, in your input, the random number was always equal to the price and there was only 1 category. In other words, your problem is at least as hard as the knapsack problem, so you cannot expect to find a guaranteed polynomial time solution.
You can generate all valid sets combinatorially pretty easily using nested loops: loop per category, looping over the items in that category. Early on you can improve the efficiency by skipping over an item if it has already been chosen and by skipping over the whole set once you find it is over the price cap. Put those results in a heap and then you can spit them out in order.
If your issue is that you want something with better performance than that, it seems to me more like a constraint programming, or, more specifically, a constraint satisfaction problem. I suggest you look at the techniques used to handle those kinds of problems.
Upvotes: 0
Reputation: 2355
Alright, here is my second try to answer this question.
Lets say we have following input
class Item {
public:
// using single unsigned int bitwise check sets
unsigned int category;
int name;
int value;
int price;
...
};
class ItemSet
{
public:
set<Item> items;
int sum;
};
First, sort input data based on highest random value, then lowest price
bool operator<(const Item& item1, const Item& item2) {
if (item1.value == item2.value) {
if (item1.price == item2.price) {
return item1.name < item2.name;
}
return item1.price > item2.price;
}
return item1.value < item2.value;
}
...
vector<Item> v = generateTestItem();
sort(v.begin(), v.end(), greater<Item>());
Next use backtracking to collect top sets into heap until conditions met. Having sorted input data in our backtracking algorithm guarantees that collected data is top based on highest value
and lowest price
. One more thing to note, I compared item categories (currentCats
) with bit manipulation which gives O(1)
complexity.
priority_queue<ItemSet> output;
void helper(vector<Item>& input, set<Item>& currentItems, unsigned int currentCats, int sum, int index)
{
if (index == input.size()) {
// if index reached end of input, exit
addOutput(currentItems);
return;
}
if (output.size() >= TOP_X) {
// if output data size reached required size, exit
return;
}
if (sum + input[index].price < PRICE_TAG) {
if ((input[index].category & currentCats) == 0) {
// this category does not exists in currentCats
currentItems.insert(input[index]);
helper(input, currentItems, currentCats | input[index].category, sum + input[index].price, index + 1);
}
} else {
addOutput(currentItems);
return;
}
if (currentItems.find(input[index]) != currentItems.end()) {
currentItems.erase(input[index]);
}
helper(input, currentItems, currentCats, sum, index + 1);
return;
}
void getTopItems(vector<Item>& items)
{
set<Item> myset;
helper(items, myset, 0, 0, 0);
}
In the worst case this backtracking would run O(2^N) complexity, but since TOP_X
is limited value it should not take too long in real.
I tried to test this code by generating random values and it seems working fine. Full code can be found here
Upvotes: 1
Reputation: 484
What you are trying to achieve is a form of maximal matching, and I don't know if there is an efficient way to compute in-order sets, but still this reduction might help you.
Define a bipartite graph with on one side one node per category, and on the other side one node per item. Add an edge between an item and a category if that items belongs in that category, with a weight defined by the random value of the item.
A "set" as you defined it is a maximum-cardinality matching in that graph. They can be enumerated in reasonable time, as proved by Takeaki Uno in "A Fast Algorithm for Enumerating Non-Bipartite Maximal Matchings", and it is likely to be even faster in your situation because your graph is bipartite.
Among those sets, you are looking for the ones with maximal weight and under a price constraint. Depending on your data, it may be enough to just enumerate them all, filter them based on the price, and sort the remaining results if there are not too many.
If that is not the case, then you may find the best set by solving the combinatorial optimization problem whose objective function is the total weight, and whose constraints are the price limit and the cardinal (known as maximum-weighted matching in the litterature). There may even be solvers already online once you write the problem in this form. However, this will only provide one such set rather than a sorted list, but as this problem is very hard in the general case, this is the best you can hope for. You would need more assumptions on your data to have better results (like bounds on the maximum number of such sets, the maximum number of items that can belong to more than k categories, etc.)
Upvotes: 3