learning_python
learning_python

Reputation: 75

python; counting elements of vectors

I would like to count and save in a vector a the number of elements of an array that are greater than a certain value t. I want to do this for different ts.

eg

My vector:c=[0.3 0.2 0.3 0.6 0.9 0.1 0.2 0.5 0.3 0.5 0.7 0.1]

I would like to count the number of elements of c that are greater than t=0.9, than t=0.8 than t=0.7 etc... I then want to save the counts for each different value of t in a vector

my code is (not working):

for t in range(0,10,1):
    for j in range(0, len(c)):    
        if c[j]>t/10:
            a.append(sum(c[j]>t))

my vector a should be of dimension 10, but it isn't!

Anybody can help me out?

Upvotes: 3

Views: 1901

Answers (6)

Jaime
Jaime

Reputation: 67427

It depends on the sizes of your arrays, but your current solution has O(m*n) complexity, m being the number of values to test and n the size of your array. You may be better off with O((m+n)*log(n)) by first sorting your array in O(n*log(n)) and then using binary search to find the m values in O(m*log(n)). Using numpy and your sample c list, this would be something like:

>>> c
[0.3, 0.2, 0.3, 0.6, 0.9, 0.1, 0.2, 0.5, 0.3, 0.5, 0.7, 0.1]
>>> thresholds = np.linspace(0, 1, 10, endpoint=False)
>>> thresholds
array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9])

>>> len(c) - np.sort(c).searchsorted(thresholds, side='right')
array([12, 10,  8,  5,  5,  3,  2,  1,  1,  0])

Upvotes: 1

Reti43
Reti43

Reputation: 9796

There are few things wrong with your code.

my vector a should be of dimension 10, but it isn't!

That's because you don't append only 10 elements in your list. Look at your logic.

for t in range(0,10,1):
    for j in range(0, len(c)):    
        if c[j]>t/10:
            a.append(sum(c[j]>t))

For each threshold, t, you iterate over all 12 items in c one at a time and you append something to the list. Overall, you get 120 items. What you should have been doing instead is (in pseudocode):

for each threshold:
    count = how many elements in c are greater than threshold
    a.append(count)

numpy.where() gives you the indices in an array where a condition is satisfied, so you just have to count how many indices you get each time. We'll get to the full solution is a moment.

Another potential error is t/10, which in Python 2 is integer division and will return 0 for all thresholds. The correct way would be to force float division with t/10.. If you're on Python 3 though, you get float division by default so this might not be a problem. Notice though that you do c[j] > t, where t is between 0 and 10. Overall, your c[j] > t logic is wrong. You want to use a counter for all elements, like other answers have shown you, or collapse it all down to a one-liner list comprehension.

Finally, here's a solution fully utilising numpy.

import numpy as np

c = np.array([0.3, 0.2, 0.3, 0.6, 0.9, 0.1, 0.2, 0.5, 0.3, 0.5, 0.7, 0.1])

thresh = np.arange(0, 1, 0.1)
counts = np.empty(thresh.shape, dtype=int)

for i, t in enumerate(thresh):
    counts[i] = len(np.where(c > t)[0])

print counts

Output:

[12 10  8  5  5  3  2  1  1  0]

Letting numpy take care of the loops under the hood is faster than Python-level loops. For demonstration:

import timeit

head = """
import numpy as np

c = np.array([0.3, 0.2, 0.3, 0.6, 0.9, 0.1, 0.2, 0.5, 0.3, 0.5, 0.7, 0.1])

thresh = np.arange(0, 1, 0.1)
"""

numpy_where = """
for t in thresh:
    len(np.where(c > t)[0])
"""

python_loop = """
for t in thresh:
    len([element for element in c if element > t])
"""

n = 10000

for test in [numpy_where, python_loop]:
    print timeit.timeit(test, setup=head, number=n)

Which on my computer results in the following timings.

0.231292377372
0.321743753994

Upvotes: 2

karakfa
karakfa

Reputation: 67507

If you simplify your code bugs won't have places to hide!

c=[0.3, 0.2, 0.3, 0.6, 0.9, 0.1, 0.2, 0.5, 0.3, 0.5, 0.7, 0.1]    
a=[]    
for t in [x/10 for x in range(10)]:
    a.append((t,len([x for x in c if x>t])))

a
[(0.0, 12),
 (0.1, 10),
 (0.2, 8),
 (0.3, 5),
 (0.4, 5),
 (0.5, 3),
 (0.6, 2),
 (0.7, 1),
 (0.8, 1),
 (0.9, 0)]

or even this one-liner

[(r/10,len([x for x in c if x>r/10])) for r in range(10)]

Upvotes: 1

ednincer
ednincer

Reputation: 951

You have to divide t / 10.0 so the result is a decimal, the result of t / 10 is an integer

a = []
c=[0.3, 0.2, 0.3, 0.6, 0.9, 0.1, 0.2, 0.5, 0.3, 0.5, 0.7, 0.1]
for t in range(0,10,1):
    count = 0
    for j in range(0, len(c)):
        if c[j]>t/10.0:
            count = count+1
    a.append(count)
for t in range(0,10,1):
    print(str(a[t]) + ' elements in c are bigger than ' + str(t/10.0))

Output:

12 elements in c are bigger than 0.0
10 elements in c are bigger than 0.1
8 elements in c are bigger than 0.2
5 elements in c are bigger than 0.3
5 elements in c are bigger than 0.4
3 elements in c are bigger than 0.5
2 elements in c are bigger than 0.6
1 elements in c are bigger than 0.7
1 elements in c are bigger than 0.8
0 elements in c are bigger than 0.9

You can check the test here

Upvotes: 1

Garrett R
Garrett R

Reputation: 2662

I made a function that loops over the array and just counts whenever the value is greater than the supplied threshold

c=[0.3, 0.2, 0.3, 0.6, 0.9, 0.1, 0.2, 0.5, 0.3, 0.5, 0.7, 0.1]

def num_bigger(threshold):
    count = 0
    for num in c:
        if num > threshold:
            count +=1

    return count

thresholds = [x/10.0 for x in range(10)]

for thresh in thresholds:
    print thresh, num_bigger(thresh)

Note that the function checks for strictly greater, which is why, for example, the result is 0 when the threshold is .9.

Upvotes: 2

Michal Frystacky
Michal Frystacky

Reputation: 1468

Your problem is here:

if c[j]>t/10:

Notice that both t and 10 are integers and so you perform integer division. The easiest solution with the least changes is to change it to:

if c[j]>float(t)/10:

to force float division

So the whole code would look something like this:

a = []
c = [0.3, 0.2, 0.3, 0.6, 0.9, 0.1, 0.2, 0.5, 0.3, 0.5, 0.7, 0.1]
for i in range(10): #10 is our 1.0 change it to 9 if you want to iterate to 0.9
    sum = 0
    cutoff = float(i)/10
    for ele in c:
        if ele <= cutoff:
            sum += ele
    a.append(sum)
print(len(a)) # prints 10, the numbers from 0.0 - 0.9
print(a) # prints the sums going from 0.0 cutoff to 1.0 cutoff

Upvotes: 1

Related Questions