Joel
Joel

Reputation: 23907

test if list contains a number in some range

Let's say I have a list L=[1.1, 1.8, 4.4, 5.2]. For some integer, n, I want to know whether L has a value val with n-1<val<n+1, and if so I want to know the index of val.

The best I can do so far is to define a generator

x = (index for index,val in enumerate(L) if n-1<val<n+1)

and check to see whether it has an appropriate value using try... except. So let's assume I'm looking for the smallest n>=0 for which such a value exists...

L=[1.1, 1.8, 4.4, 5.2]
n=0
while True:
    x = (index for index,val in enumerate(L) if n-1<val<n+1)
    try:
        index=next(x)
        break
    except StopIteration:
        n+=1
print n,index

1 0

In reality, I'm doing a more complicated task. I'll want to be able to take an n, find the first index, and if it doesn't exist, I need to do something else.

This doesn't seem like particularly clean code to me. Is there a better way? I feel like numpy probably has the answer, but I don't know it well enough.

Upvotes: 2

Views: 6720

Answers (5)

ig-melnyk
ig-melnyk

Reputation: 2879

Now,when I think,that I finally understand your task : just find the minimum value in the array and it's index - n will be equal to cell(mininum). Or even simpler :

n,index = int(min(L)),L.index(min(L))

Upvotes: 1

Hugh Bothwell
Hugh Bothwell

Reputation: 56714

If L is sorted, you could use bisect.bisect_left to find the index i for which all L[< i] < n <= all L[>= i].

Then

if n - L[i-1] < 1.0:
    val = L[i-1]
elif L[i] - n < 1.0:
    val = L[i]
else:
    val = None     # no such value found

Edit: Depending on your data, what you want to accomplish, and how much time you want to spend writing a clever algorithm, sorting may or may not be a good solution for you; and before I see too many more O(n)s waved around, I would like to point out that his actual problem seems to involve repeatedly probing for various values of n - which would pretty rapidly amortize the initial sorting overhead - and that his suggested algorithm above is actually O(n**2).

@AntoinePelisse: by all means, let's do some profiling:

from bisect import bisect_left, bisect_right
from functools import partial
import matplotlib.pyplot as plt
from random import randint, uniform
from timeit import timeit

#blues    
density_col_lin = [
    (0.000, 0.502, 0.000, 1.000),
    (0.176, 0.176, 0.600, 1.000),
    (0.357, 0.357, 0.698, 1.000),
    (0.537, 0.537, 0.800, 1.000)
]

# greens
density_col_sor = [
    (0.000, 0.502, 0.000, 1.000),
    (0.176, 0.600, 0.176, 1.000),
    (0.357, 0.698, 0.357, 1.000),
    (0.537, 0.800, 0.537, 1.000)
]

def make_data(length, density):
    max_ = length / density
    return [uniform(0.0, max_) for _ in range(length)], max_

def linear_probe(L, max_, probes):
    for p in range(probes):
        n = randint(0, int(max_))
        for index,val in enumerate(L):
            if n - 1.0 < val < n + 1.0:
                # return index
                break

def sorted_probe(L, max_, probes):
    # initial sort
    sL = sorted((val,index) for index,val in enumerate(L))
    for p in range(probes):
        n = randint(0, int(max_))
        left  = bisect_right(sL, (n - 1.0, max_))
        right = bisect_left (sL, (n + 1.0, 0.0 ), left)
        if left < right:
            index = min(sL[left:right], key=lambda s:s[1])[1]
            # return index

def main():
    densities = [0.8, 0.2, 0.08, 0.02]
    probes    = [1, 3, 10, 30, 100]
    lengths   = [[]                   for d in densities]
    lin_pts   = [[[] for p in probes] for d in densities]
    sor_pts   = [[[] for p in probes] for d in densities]

    # time each function at various data lengths, densities, and probe repetitions
    for d,density in enumerate(densities):
        for trial in range(200):
            print("{}-{}".format(density, trial))

             # length in 10 to 5000, with log density
            length = int(10 ** uniform(1.0, 3.699))
            L, max_ = make_data(length, density)
            lengths[d].append(length)

            for p,probe in enumerate(probes):
                lin = timeit(partial(linear_probe, L, max_, probe), number=5) / 5
                sor = timeit(partial(sorted_probe, L, max_, probe), number=5) / 5
                lin_pts[d][p].append(lin / probe)
                sor_pts[d][p].append(sor / probe)

    # plot the results
    plt.figure(figsize=(9.,6.))
    plt.axis([0, 5000, 0, 0.004])

    for d,density in enumerate(densities):
        xs = lengths[d]
        lcol = density_col_lin[d]
        scol = density_col_sor[d]

        for p,probe in enumerate(probes):
            plt.plot(xs, lin_pts[d][p], "o", color=lcol, markersize=4.0)
            plt.plot(xs, sor_pts[d][p], "o", color=scol, markersize=4.0)

    plt.show()

if __name__ == "__main__":
    main()

which results in

enter image description here

x-axis is number of items in L, y-axis is amortized time per probe; green dots are sorted_probe(), blue are linear_probe().

Conclusions:

  • runtimes for both functions are remarkably linear with respect to length
  • for a single probe into L, presorting is about 4 times slower than iterating
  • the crossover point seems to be about 5 probes; for fewer than that, linear search is faster, for more, presorting is faster.

Upvotes: 4

Anzel
Anzel

Reputation: 20583

I have an interesting thought, by using defaultdict and build the index with the values (n-1) and (n+1), it will require looping the list once, and thereafter just compare the key/values, like this:

from collections import defaultdict

L = [1.1, 1.8, 4.4, 5.2]

x = defaultdict(dict)
for idx, item in enumerate(L):
    x[int(item)] = {int(item-1): item-1, int(item+1): item+1, 'index':idx}

Usage:

n = 5

x[n].get(n-1) < n < x[n].get(n+1) and x[n]['index']
Out[8]: 3

n = 2

x[n].get(n-1) < n < x[n].get(n+1) and x[n]['index']
Out[10]: False

Explanation:

As if:

1) True and Index returns Index

2) False and Index will return False

Because you are going to input n as an integer, if first part is True it will return the second part index value. If first part fails, it returns False.

This will return the LAST occurrence of n, in case you need the FIRST occurrence of n, just reversed the list and Index:

...
l = len(L)
for idx, item in enumerate(reversed(L)):
    x[int(item)] = {int(item-1): item-1, 
                    int(item+1): item+1, 
                    'index': l-idx-1}
...

Upvotes: 2

jez
jez

Reputation: 15369

Here's a solution that doesn't rely on try... except and is relatively easy-to-read. As a result it "feels cleaner" to me, but that's always going to have an element of subjectivity to it.

def where_within_range( sequence, lower, upper ):
    for index, value in enumerate( sequence ):
        if lower < value < upper: return index

L = [ 1.1, 1.8, 4.4, 5.2 ]

import itertools
for n in itertools.count():
    index = where_within_range( L, n - 1, n + 1 )
    if index != None: break

print n, index

If you'd prefer to avoid the repeated-function-call overhead you could instead do it as follows, which once again uses the StopIteration exception but by using itertools.count and a return statement, (again, "somehow") ends up seeming cleaner to me. Perhaps it's because there's only one statement in each part of the try...except... clause (there's not much rational basis for that feeling, admittedly).

import itertools
def find_joel_root( sequence ):
    for n in itertools.count():
        solutions = ( index for index, value in enumerate( sequence ) if n - 1 < value < n + 1 )
        try: return n, next( solutions )
        except StopIteration: pass

L = [ 1.1, 1.8, 4.4, 5.2 ]
n, index = find_joel_root( L )
print n, index

Upvotes: 2

Phil Cooper
Phil Cooper

Reputation: 5877

l can be a list or numpy array:

next(((i,v) for i,v in enumerate(l) if n-1<v<n+1))

uses generators and stops on the first value.

Upvotes: 1

Related Questions