Reputation: 14581
I have a list of millions of numbers. I want to find out if the difference between each number in the ordered list is the same for the entire list.
list_example = [ 0, 5, 10, 15, 20, 25, 30, 35, 40, ..etc etc etc]
What's the best way to do this?
My try:
import collections
list_example = [ 0, 5, 10, 15, 20, 25, 30, 35, 40 ]
count = collections.Counter()
for x,y in zip(list_example[0::],list_example[1::]):
print x,y,y-x
count[y-x] +=1
if len( count ) == 1:
print 'Differences all the same'
Result:
0 5 5
5 10 5
10 15 5
15 20 5
20 25 5
25 30 5
30 35 5
35 40 5
Differences all the same
Upvotes: 19
Views: 46870
Reputation: 10230
Since what you want to find out is actually whether all differences are equal, it might be best to not compute them at all. Instead compute just one, and use it to get the expected numbers and just check that you have them:
list_example = [ 0, 5, 10, 15, 20, 25, 30, 35, 40 ]
start = list_example[0]
step = list_example[1] - start
stop = start + len(list_example) * step
expect = range(start, stop, step)
print(all(x == y for x, y in zip(list_example, expect)))
If it's likely that the differences are all equal, so that you likely have to go through the whole list, it might also be better to finish with list_example == list(expect)
instead, avoiding Python-Level slowness.
Or instead of the generator expression, you could use all(map(operator.eq, list_example, expect))
, which might also be faster.
Benchmark with two million numbers of equal difference:
405 ms expecter_1
242 ms expecter_2
212 ms expecter_3
216 ms expecter_4
219 ms expecter_5
693 ms top_voted
395 ms accepted
def expect(xs):
start = xs[0]
step = xs[1] - start
stop = start + len(xs) * step
return range(start, stop, step)
def expecter_1(xs):
return all(x == y for x, y in zip(xs, expect(xs)))
def expecter_2(xs):
return xs == list(expect(xs))
def expecter_3(xs):
return all(map(operator.eq, xs, expect(xs)))
def expecter_4(xs):
return all(False for x, y in zip(xs, expect(xs)) if x != y)
def expecter_5(xs):
for x, y in zip(xs, expect(xs)):
if x != y:
return False
return True
def top_voted(x):
xdiff = [x[n]-x[n-1] for n in range(1,len(x))]
return all([xdiff[0] == xdiff[n] for n in range(1,len(xdiff))])
def accepted(s):
x = s[1] - s[0]
for i in range(2, len(s)):
if s[i] - s[i-1] != x:
return False
return True
from timeit import repeat
import operator
xs = list(range(0, 10**7, 5))
for f in expecter_1, expecter_2, expecter_3, expecter_4, expecter_5, top_voted, accepted:
t = min(repeat(lambda: f(xs), number=1))
print(f'{t*1e3:.0f} ms ', f.__name__)
Upvotes: 0
Reputation: 1225
This is pretty easy as of 2024 (itertools
is sweet):
>>> l = list(range(1, 100))
>>> import itertools
>>> [p[1] - p[0] for p in itertools.pairwise(l)]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
>>>
Upvotes: 0
Reputation: 1
I had a similar problem and I used the following method:
foo = map(lambda x: lst[x+1]-lst[x],range(len(lst)-1))
Upvotes: 0
Reputation: 20732
I came to this while playing around:
>>> exm = [0,5,10,15,20,25,30,35]
>>> len(set(exm[a + 1] - exm[a] for a in range(0, len(exm) - 1))) == 1
What I do is for each pair of consecutive items determine their difference in a generator. I then add all those differences to a set to only keep the unique values. If the length of this set is 1 all the differences are the same.
Edit: Looking at cldy's answer you can halt execution early when any item is found not the be the same as your initial difference:
>>> exm = [0,5,10,15,20,25,30,35]
>>> initial_diff = exm[1] - exm[0]
>>> difference_found = any((exm[a + 1] - exm[a]) != initial_diff
for a in range(1, len(exm) - 1))
Upvotes: 3
Reputation: 8051
itertools.izip is great for this kind of thing:
>>> lst = [1,1,2,3,5,8]
>>> [y-x for x, y in itertools.izip (lst, lst[1:])]
[0, 1, 1, 2, 3]
Upvotes: 7
Reputation: 17173
>>> x=[10,15,20,25]
>>> diff=(x[-1]-x[0])/(len(x)-1)
>>> diff
5
>>> all(x[i]+diff==x[i+1] for i in range(len(x)-1))
True
Upvotes: 2
Reputation: 11005
Here's an example using the diff function in numpy
.
e.g.
import numpy
numpy_array = numpy.zeros(10**6)
for i in xrange(10**6):
numpy_array[i] = i
print numpy.any(numpy.diff(a) == 1)
True
Upvotes: 1
Reputation: 3259
Need notice that the list may have millions of numbers. So ideally, we shouldn't iterate over the entire list unless it's necessary. Also we need avoid construct new list, which may have significant memory consumption. Using all and a generator will solve the problem
>>> x = [5, 10, 15, 20, 25]
>>> all(x[i] - x[i-1] == x[i+1] - x[i] for i in xrange(1, len(x) - 1))
True
Upvotes: 9
Reputation: 38247
Here's a solution using iterators just for comparison.. and possibly an advantage of not needing to know the length of your input data; you might be able to avoid having the millions of list items loaded into memory in the first place...
from itertools import tee, izip
# From itertools recipes: http://docs.python.org/library/itertools.html
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
class DifferenceError(Exception):
pass
def difference_check(data):
idata = pairwise(data)
(a,b) = idata.next()
delta = b - a
for (a,b) in idata:
if b - a != delta:
raise DifferenceError("%s -> %s != %s" % (a,b,delta))
return True
Upvotes: 0
Reputation: 5914
The straight approach here is the best:
x = s[1] - s[0]
for i in range(2, len(s)):
if s[i] - s[i-1] != x: break
else:
#do some work here...
Upvotes: 10
Reputation: 35088
Using pure Python:
>>> x = [0,5,10,15,20]
>>> xdiff = [x[n]-x[n-1] for n in range(1,len(x))]
>>> xdiff
[5, 5, 5, 5]
>>> all([xdiff[0] == xdiff[n] for n in range(1,len(xdiff))])
True
It's a little easier, and probably faster, if you use NumPy:
>>> import numpy as np
>>> xdiff = np.diff(x)
>>> np.all(xdiff[0] == xdiff)
True
But both of these create two extra lists (or arrays, in the case of NumPy) which may gobble up your available memory if you have millions of numbers.
Upvotes: 28