Matthew Rittinghouse
Matthew Rittinghouse

Reputation: 55

Calculating an optimum set of rows given certain constraints in R

Sorry for the vague sounding title, but here's the set up:

I have a list of parts, each with a manufacturer, a cost, and a profit. I'll add a snippet, but this would be a long list (dozens of manufacturers, hundreds of parts).

Manufacturer    Part Name   Cost    Profit
Cohiba          Behike 54   10.95   5.05
Rocky Patel     Edge        13.99   8.01
Acid            Liquid      8.49    3.51

I have a code that samples each unique manufacturer to select one part at random for each, and then calculates the total sum cost and profit.

ind <- sapply (unique( data$Manufacturer ) , function(x)
  sample( which(data$Manufacturer==x) , 1 ) )

Sampler <- data[ ind, ]

sum(Sampler$Profit)

sum(SamplerX$Cost)

I feel like there has to be a smarter way to ask it to simply find the optimum list of one unique part per manufacturer to give me the highest profit for the lowest cost. Can anyone give me some insight?

Upvotes: 2

Views: 152

Answers (2)

PavoDive
PavoDive

Reputation: 6496

For completeness:

The knapsack problem is one where a robber wants to maximize the price of the stolen items, while keeping weight at or below his sack capacity. The adagio package has a function to solve it.

library(adagio)
# create some random data:
set.seed(1)
weights <- sample(1:100,30,FALSE)
prices <- sample(1:10000,30,FALSE)

# find what is the total weight
sum(weights)
[1] 1383

# Solve the problem, allowing a capacity of about 10% the total weight:
a <- knapsack(w=weights, p=prices, cap=138)

# See what a returns:
a
$capacity
[1] 138

$profit
[1] 50928

$indices
[1]  1  5 10 11 12 19 22 27

# validate results:
sum(weights[a$indices])
[1] 138

Please keep in mind that you'll need a lot of capacity if your vectors are large.

####### EDIT TO ADD #######

Considering that you want to maximize profit while keeping cost below certain limit, AND not exceeding more than certain number of manufacturers (one, in your question), this is a two-dimensional knapsack problem, for which I didn't find any function or package that solves it.

Alternatives:

  1. Code it yourself: a good start would be adagio::knapsack (without the parenthesis, so you can see the code), and googling "two dimensional knapsack". There are plenty of algorithms in pseudo-code so you won't start from a blank sheet.
  2. A workaround: If your output vector isn't really large, you could use adagio::knapsack() disregarding manufacturer and find a near-solution. Then you'll have to manually find wich manufacturers are duplicated in the result vector, and then find an item that is as close as possible below the cost of the item to replace and that belongs to a different, not-yet-used manufactor, with the highest profit possible. Please notice that this won't necessarily yield the best available solution, i.e., an optimum (the problem is NP-hard, so anyway it probably wouldn't), but it would be a good approximation.

Upvotes: 1

Matthew Rittinghouse
Matthew Rittinghouse

Reputation: 55

I have to give full credit to PavoDive for helping me solve this. His use of the phrase "Knapsack Problem" is what got my head spinning, as I hadn't heard a version of the knapsack riddle since middle school.

Once he said that, I was able to quickly connect the dots and actually found that a package exists already to solve Knapsack problems with this exact setup:

http://www.inside-r.org/packages/cran/adagio/docs/knapsack

The answer I needed is right there. Sharing this in case anyone else ever needs to solve this problem.

Upvotes: 0

Related Questions