tiajuana6
tiajuana6

Reputation: 37

Duplicate strings in nested lists

Set up - I have a list with nested lists (call it relay sets) that looks something like this:

relay_sets = [
    [[80100, 'a']], 
    [[75000, 'b']], 
    [[64555, 'c']], 
    [[55000, 'd'], [44000, 'e']], 
    [[39000, 'f'], [2000, 'g']], 
    [[2000, 'h'], [999, 'i'], [999, 'j']], 
    [[999, 'k'], [999, 'l'], [343, 'm']]
]

A user is put in to one of the relay sets (with uniform probability) and then again uses one of the nested lists in the sublist with equal probability. So for example a user has a 1/7 chance of being put in [[80100, 'a']] but then is guaranteed to use [80100, 'a'], whereas a user has 1/7 chance of being put in [[55000, 'd'], [44000, 'e']] and then has a 1/2 chance of using [55000, 'd'].

I wrote a script that finds the probability of using a certain relay:

def prob_using_relay(relay, relaysets):
  prob1 = 1 / float(len(relaysets))
  for index, item in enumerate(relaysets):
    for i in item:
      if relay in i:
        num_relays_in_set = len(relaysets[index])
        prob2 = 1 / float(num_relays_in_set)
  total_prob = float(prob1 * prob2)
  return total_prob

print prob_using_relay(80100, relay_sets) # gives correct answer of 0.142857142857

My problem is I don't know how to account for duplicates. So if I wanted to know the chance of using a relay with 999 bandwidth the answer should be (2/3 * 1/7 + 2/3 * 1/7) but my script gives 1/21. Or if I wanted to know the chance of using a relay with 2000 bandwidth the answer should be (1/2 * 1/7 + 1/3 * 1/7) instead of (1/2 * 1/7).

Could anyone help with altering my function to account for duplicates?

Upvotes: 0

Views: 76

Answers (3)

Jose Varez
Jose Varez

Reputation: 2077

Also, you can use Fractions:

from fractions import Fraction

def prob_using_relay(relay, relaysets):
     prob = Fraction(0,1)
     for set in relaysets:
       if isinstance(set, list) and relay in set:
           prob += Fraction(1, len(relaysets))
       elif isinstance(set, list):
           prob += Fraction(1, len(relaysets)) * prob_using_relay(relay, set)           
     return prob

prob_using_relay(999, relay_sets)
Out: Fraction(4, 21)

Upvotes: 0

inspectorG4dget
inspectorG4dget

Reputation: 114035

def prob(bandwidth, relaysets):
    answer = 0.0
    for subl in relaysets:
        answer += sum(1.0 for r in subl if r[0] == bandwidth)/len(subl)
    return answer/len(relaysets)

Output:

In [44]: prob(999, relay_sets)
Out[44]: 0.19047619047619047

In [45]: 4/21
Out[45]: 0.19047619047619047

Upvotes: 1

simonzack
simonzack

Reputation: 20938

There are lots of problems with your code. You are doing useless casts to floats. Another being that it can't handle probabilities more than 2 levels deep. Here's a recursive approach which can handle this:

relay_sets = [
    [[80100, 'a']],
    [[75000, 'b']],
    [[64555, 'c']],
    [[55000, 'd'], [44000, 'e']],
    [[39000, 'f'], [2000, 'g']],
    [[2000, 'h'], [999, 'i'], [999, 'j']],
    [[999, 'k'], [999, 'l'], [343, 'm']]
]

def prob_using_relay(relay, relaysets):
    if isinstance(relaysets[0], int):
        return int(relaysets[0] == relay)
    return sum(prob_using_relay(relay, relayset) for relayset in relaysets) / len(relaysets)

print(prob_using_relay(80100, relay_sets))

Upvotes: 1

Related Questions