Comparing distributions of values to a specific value

I have 1000 distributions of 459 floats between 0.0 and 1.0 stored in variable prop_list_test2

I also have 1000 values to compare each distribution to stored as p_95_null. For each distribution, I am trying to find the proportion of the distribution that is >= its p_95_null counterpart. So for the first distribution in prop_list_test2 I want to compare it against the first value in p_95_null and so on, until I have an array of 1000 proportions pv.

Here is my attempt at doing it, although it's a very messy and non-pythonic way of going about it

pv = []
index = 0

comp = p_95_null[index] #What we're comparing it to
truth_list = []

while index<len(p_95_null):
    test_list = [] #Which distribution from prop_list_test2 we are using
    truth_list = []    
    for i in prop_list_test2[index]:
        test_list.append(i)

    for i in test_list:
        if i >= comp:
            truth_list.append(True)
            test_list = []
            index+=1
        elif i < comp:
            truth_list.append(False)
            test_list = []
            index+=1

    pv.append((sum(truth_list)/len(truth_list)))


print(pv)

My output is [0.06318082788671024, 0.058823529411764705, 0.058823529411764705]. Something isn't working as I was expecting 1000 values in pv, but I only get 3. What part of my code is causing this issue, I can't seem to figure it out.

Upvotes: 0

Views: 86

Answers (1)

Marat
Marat

Reputation: 15738

This is the pythonic way to do this:

pv = [sum(v > p_95 for v in values)/len(values) 
      for values, p_95 in zip(prop_list_test2, p_95_null)]

Explanation:

  • overall, this(pv = [... for ... in ...]) is a list comprehension - a syntax in Python helpful to map sequences
  • zip(...) pairs a list of float values with their p95 thresholds, so it's easier to iterate without messing with indexes
  • the left part is pretty much the same as the last line in your code. The only difference is that internal for loop is replaced with a generator, which is then passed to sum

Code review:

pv = []
index = 0

comp = p_95_null[index] #What we're comparing it to
truth_list = []

# nothing is wrong with this line, but it would be more appropriate to:
# for index, test_list in enumerate(prop_list_test2):
while index<len(p_95_null):
    test_list = [] #Which distribution from prop_list_test2 we are using
    truth_list = []

    for i in prop_list_test2[index]:
        test_list.append(i)

    # This is why it fails: index is used by while as prop_list_test index,
    # but here it is incremented for values in each sublist
    # instead, `index+=1` should be moved out of the for loop
    for i in test_list:
        if i >= comp:
            truth_list.append(True)
            test_list = []
            index+=1
        elif i < comp:
            truth_list.append(False)
            test_list = []
            index+=1

    pv.append((sum(truth_list)/len(truth_list)))

Upvotes: 1

Related Questions