Reputation:
For e.g. I have a list of number like
lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
And I need 2 numbers at 3 distance from a given number let's say 5, so the output list should look like this
output_lst = [2, 8]
Here by distance I mean the distance on number-line and not in list index. So 3 numbers, 2 distance from 5 would give
output_lst = [3,3,7]
What I tired doing was to use nsmallest from heapq like this
check_number = 5
output_lst = nsmallest(3, lst, key=lambda x: abs(x - check_number))
But the problem here is that I don't know specify the distance. It will just outputs 3 closest numbers to 5.
[4,4,5]
Upvotes: 2
Views: 133
Reputation: 52181
You can use a list comprehension for that. See this post for more on list comprehensions.
>>> lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
>>> given_numer = 5
>>> distance = 3
>>> [i for i in lst if abs(i-given_numer)==distance]
[2, 8]
The logic is quite simple, we just see if the absolute value of the difference between each number and the given number, if so we return the value. Similarly
>>> distance = 2
>>> [i for i in lst if abs(i-given_numer)==distance]
[3, 3, 7]
Let's complicate a bit and try to use filter
and closures. The code is:
Just to show that it is an alternative.
def checkdistance(given_number,distance):
def innerfunc(value):
return abs(value-given_number)==distance
return innerfunc
lst = [1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
given_number = 5
distance = 3
checkdistance3from5 = checkdistance(5,3)
list(filter(checkdistance3from5,lst))
Upvotes: 3
Reputation: 210972
numpy approach:
import numpy as np
check_number = 5
distance = 3
a = np.array([1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9])
a[np.absolute(a - check_number) == distance]
Check:
In [46]: a
Out[46]: array([1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9])
In [47]: a[np.absolute(a-5) == 3]
Out[47]: array([2, 8])
Timings (in ms
1/1000 seconds ) for differently sized arrays/lists:
In [141]: df
Out[141]:
numpy list_comprehension
size
10 0.0242 0.0148
20 0.0248 0.0179
30 0.0254 0.0219
50 0.0267 0.0288
100 0.0292 0.0457
1000 0.0712 0.3210
10000 0.4290 3.3700
100000 3.8900 33.6000
1000000 46.4000 343.0000
plot:
bar plot for arrays with size <= 1000 (df[df.index<=1000].plot.bar()
):
Code:
def np_approach(n, check_number=5, distance=3):
a = np.random.randint(0,100, n)
return a[np.absolute(a - check_number) == distance]
def list_comprehension(n, check_number=5, distance=3):
lst = np.random.randint(0,100, n).tolist()
return [i for i in lst if abs(i-check_number)==distance]
In [102]: %timeit list_comprehension(10**2)
10000 loops, best of 3: 45.7 ┬╡s per loop
In [103]: %timeit np_approach(10**2)
10000 loops, best of 3: 29.2 ┬╡s per loop
In [104]: %timeit list_comprehension(10**3)
1000 loops, best of 3: 321 ┬╡s per loop
In [105]: %timeit np_approach(10**3)
The slowest run took 4.48 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 71.2 ┬╡s per loop
In [106]: %timeit list_comprehension(10**4)
100 loops, best of 3: 3.37 ms per loop
In [107]: %timeit np_approach(10**4)
1000 loops, best of 3: 429 ┬╡s per loop
In [108]: %timeit list_comprehension(10**5)
10 loops, best of 3: 33.6 ms per loop
In [109]: %timeit np_approach(10**5)
100 loops, best of 3: 3.89 ms per loop
In [110]: %timeit list_comprehension(10**6)
1 loop, best of 3: 343 ms per loop
In [111]: %timeit np_approach(10**6)
10 loops, best of 3: 46.4 ms per loop
In [112]: %timeit list_comprehension(50)
10000 loops, best of 3: 28.8 ┬╡s per loop
In [113]: %timeit np_approach(50)
10000 loops, best of 3: 26.7 ┬╡s per loop
In [118]: %timeit list_comprehension(40)
The slowest run took 6.61 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.89 ┬╡s per loop
In [119]: %timeit np_approach(40)
The slowest run took 8.87 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 10.2 ┬╡s per loop
In [120]: %timeit list_comprehension(30)
10000 loops, best of 3: 21.9 ┬╡s per loop
In [121]: %timeit np_approach(30)
10000 loops, best of 3: 25.4 ┬╡s per loop
In [122]: %timeit list_comprehension(20)
100000 loops, best of 3: 17.9 ┬╡s per loop
In [123]: %timeit np_approach(20)
10000 loops, best of 3: 24.8 ┬╡s per loop
In [124]: %timeit list_comprehension(10)
100000 loops, best of 3: 14.8 ┬╡s per loop
In [125]: %timeit np_approach(10)
10000 loops, best of 3: 24.2 ┬╡s per loop
Conclusion: numpy approach is faster compared to list comprehension approach for larger lists, for very small lists (less than 50 elements) it might be other way around
Upvotes: 3