David542
David542

Reputation: 110382

Using SQL/Python to build probability distribution

Suppose the following jobs table:

`jobpost`
 - name
 - position
 - is_featured (boolean)

I would like to build a list of suggested jobs for a given user, where jobpost.position matches the user's position (for example, an accountant would receive jobs in accounting).

The basic query to accomplish this would be something like:

SELECT name FROM jobpost WHERE jobpost.position IN (list of user positions) LIMIT 10

I also want to make sure that jobs that are featured (is_featured=True) receive extra weight. Then I need to build a probility distribution list from which a random number of jobs would be selected. For this I was thinking of building a python list of tuples, with the job name and probability, and then using random.random(). For example, something like (in pseudocode):

x = [('job 1', 0.2), ('job 2', 0.2), ('job 3', 0.2),  ('job 4', 0.4)]
# pick three out of the list of jobs above
random.random.sample(x,  3)

I have three questions related to this:

  1. Does this seem like the right approach?

  2. How would I use the random module (or another one) to select n number of objects with each object having a certain given probability?

  3. In terms of giving a weighted average to a featured job over a non-featured job, would the following query be the correct approach? If not, what would be a better way?

    SELECT name, 1 * (CASE WHEN is_featured=True THEN % ELSE 1) as weighted_average FROM ...

This would give me tuple with the job name and the relative weight.

Upvotes: 1

Views: 679

Answers (1)

Blender
Blender

Reputation: 298364

I'd recommend using SQLAlchemy and make use of Python's syntax, not SQL's. But to solve your problem, here's what I'd try:

import random

x = [('job 1', 0.2), ('job 2', 0.2), ('job 3', 0.2), ('job 4', 0.2), ('job 5', 0.4)]

def random_choice(jobs_list, number=1):
  jobs = jobs_list[:]
  choice = random.uniform(0, 1)
  total = 0

  while total < number:
    index, job_item = random.choice(list(enumerate(jobs)))
    job, weight = job_item

    if weight > random.uniform(0, 1):
      yield job
      jobs.pop(index)
      total += 1

print list(random_choice(x, 3))

Just as a side note, in order for your "weights" to follow a discrete PDF, they need to add up exactly to 1.0.

Truthfully, I don't think this "weighting" is a proper solution to the problem. You don't have a proper PDF to create the discrete "weight" for a given job. Instead, why not just have separate categories for jobs?

Upvotes: 1

Related Questions