frazman
frazman

Reputation: 33223

python: algorithm - to gather items from mean

Not sure whether this is the right place, but I have a question related to algorithm and I cant think of an efficient algorithm. So thought of sharing my problem statement.. :) To ease up what I am trying to explain, let me create a hypothetical example.

Suppose, I have a list which contains an object whcih contains two things..

lets say product id and price

Now, this is a long long list..sort of like an inventory.. out of this I have defined three price segments.. lowprice, midprice and highprice and then k1,k2,k3 where k1,k2 and k3 are ratios. So, the job is now,, I have to gather products from this huge inventory in such a way that there is n1 products from lowprice range, n2 products from midprice range and n3 products from high price range... where n1:n2:n3 == k1:k2:k3

Now, how do I efficiently achieve the following. I target the low price point is 100 dollars and I have to gather 20 products from this range.. mid price range is probably 500 dollars and so on

So I start with 100 dollars.. and then look for items between 90 and 100 and also between 100 and 110 Let say I found 5 products in interval 1 low (90,100) and 2 products in interval 1 high (100,110) Then, I go to next low interval and next high interval. I keep on doing this until I get the number of products in this interval.

How do I do this?? Also there might be case, when the number of products in a particular price range is less than what I need.. (maybe mid price range is 105 dollars...).. so what should I do in that case.. Please pardon me, if this is not the right platform.. as from the question you can make out that this is more like a debative question rather than the "I am getting this error" type of question. Thanks

Upvotes: 3

Views: 217

Answers (3)

moooeeeep
moooeeeep

Reputation: 32502

Approach 1

If the assignment what products belong to the three price segments never changes, why don't you simply build 3 lists, one for the products in each price segment (assuming these sets are disjoint). Then you may pick from these lists randomly (either with or without replacement - as you like). The number of items for each class is given by the ratios.

Approach 2

If the product-price-segment assignment is intended to be pre-specified, e.g., by passing corresponding price values for each segment on function call, you may want to have the products sorted by price and use a binary search to select the m-nearest-neighbors (for example). The parameter m could be specified according to the ratios. If you specify a maximum distance you may reject products that are outside the desired price range.

Approach 3

If the product-price-segment assignment needs to be determined autonomously, you could apply your clustering algorithm of choice, e.g., k-means, to assign your products to, say, k = 3 price segments. For the actual product selection you may proceed similarly as described above.

Upvotes: 2

amit
amit

Reputation: 178431

You are probably looking for selection algorithm.
First find the n1'th smallest element, let it be e1, and the lower bound list is all elements such that element <= e1.
Do the same for the other ranges.

pseudo code for lower bound list:

getLowerRange(list,n):
  e <- select(list,n) 
  result <- []
  for each element in list:
     if element <= e:
         result.append(element)
  return result

Note that this solution fails if there are many "identical" items [result will be a bigger list], but finding those items, and removing it from result list is not hard.

Note that selection algorithm is O(n), so this algorithm will consume linear time related to your list's size.

Upvotes: 2

Niclas Nilsson
Niclas Nilsson

Reputation: 5901

It's seems like you should try a database solution rather then using a list. Check out sqlite. It's in Python by default

Upvotes: 0

Related Questions