Reputation: 8045
By now I didn't find a convenient way to split a list by certain conditions, for example, I have a record list:
a = ((0,1),(1,0),(0,2),(1,0),(3,0),(4,0),(0,3),(1,5)....)
I want to split the content into 2 lists
alist = []
blist = []
for x in a:
if x[0] == 0:
alist.append(x)
elif x[0] == 1:
blist.append(x)
Not very concise.
Written as list comprehensions:
aList = [x for x in a if x[0] == 0]
bList = [x for x in a if x[0] == 1]
List comprehensions are usually good for reading and performance, but in this case the list must be iterated twice.
Is there a better way to do this job?
Upvotes: 5
Views: 4919
Reputation: 693
Here's a way to do this that takes advantage of the speed of a list comprehension to make one list, while incrementally appending the rejects onto another list.
blist = []
alist = [x for x in data if x[0]==0 or blist.append(x)]
This takes advantage of Python's greedily evaluating the or
, so for any x
that passes the alist
criterion (x[0]==0
) Python won't bother evaluating the second disjunct (blist.append(x)
). This is a case where it's fairly intuitive to read Python's "or" as meaning "or else do this second thing and use its output instead" (which is technically what it always means).
This also takes advantage of the fact that list.append
always returns a false-like None
value, so that in the case where some x
fails the alist
criterion and thus gets appended to blist
instead, the or
still evaluates as false, so that rejected x
still will not be included in the alist
comprehension.
Here are timeit
results (on my reasonably powerful desktop), each dividing a list of a million random numbers ranging 0-1 into a small
list of those less than 0.1 and a large
list of those larger, each trial repeated 10 times.
0.80 seconds: Using append to create both (suggested by @Ignacio)
0.68 seconds: Separate list comprehensions (suggested by @Max)
0.68 seconds: Using append to create large, comprehend to create small (inverse of what I would suggest!)
0.52 seconds: Using append to create small, comprehend to create large (suggested by me)
These results confirm what you would probably expect: primarily, list comprehension builds big lists a lot faster than append
does, and, secondarily, 1 traversal is faster than 2. If you can predict which list will be longer, better to build that large list by comprehension, while appending just a few rejects into the smaller list.
For anyone who wants to try this themselves, here's the timing code.
import random, timeit
L = [random.random() for i in range(1000000)]
def comprehend_both(): # suggested by @max
small = [n for n in L if n <= 0.1]
large = [n for n in L if n > 0.1]
def append_both(): # suggested by @ignacio
lists = small, large = [[],[]]
for n in L: lists[n > 0.1].append(n) # exploits False==0, True==1
def append_small_comprehend_large(): # suggested by @JustinFisher
small = []
large = [n for n in L if n > 0.1 or small.append(n)]
def comprehend_small_append_large(): # worse since appending more is slower
large = []
small = [n for n in L if n <= 0.1 or large.append(n)]
print( "C/C:", timeit.timeit("comprehend_both()",globals=globals(),number=10)) # 0.68 seconds
print( "A/A:", timeit.timeit("append_both()",globals=globals(),number=10)) # 0.80 seconds
print( "A/C:", timeit.timeit("append_small_comprehend_large()",globals=globals(),number=10)) # returns 0.52
print( "C/A:", timeit.timeit("comprehend_small_append_large()",globals=globals(),number=10)) # returns 0.68
Upvotes: 0
Reputation: 7514
A dict can be more concise.
alist, blist = [], []
for x in a:
{0: alist, 1: blist}.get(x[0]).append(x)
This is a flexible pattern that you can expand to cover other situations.
For example, you can add more choices. You can also gracefully deal with unexpected values by changing .get(x[0])
to .get(x[0], [])
.
Upvotes: 0
Reputation: 976
I would use filter
, see: http://docs.python.org/2/library/functions.html#filter
>>> a = ((0,1), (1,0), (0,2), (1,0), (3,0), (4,0), (0,3), (1,5))
>>> filter(lambda x: x[0] == 1, a)
((1, 0), (1, 0), (1, 5))
To avoid redundant code you could collect the conditions in an iterable, like so:
>>> fs = (lambda x: x[0] == 0, lambda x: x[0] == 1)
>>> for f in fs:
... filter(f, a)
...
((0, 1), (0, 2), (0, 3))
((1, 0), (1, 0), (1, 5))
Upvotes: 0
Reputation: 298156
If you really want to make things complicated, you can import some functions from itertools
to obfuscate your readable solution even more:
from operator import itemgetter
from collections import defaultdict
d = defaultdict(list)
for key, value in itertools.groupby(a, itemgetter(0)):
d[key].append(list(value))
Here is the output:
>>> print d[0]
[[(0, 1)], [(0, 2)], [(0, 3)]]
>>> print d[1]
[[(1, 0)], [(1, 0)], [(1, 5)]]
>>> print d[4]
[[(4, 0)]]
This code just groups the items into a dictionary using the value in the first tuple as the key. It's a little more generic than your code.
Upvotes: 1
Reputation: 65791
Well, the conditions are different, no wonder you need two loops. But if you want to sacrifice some readability,
aList, bList = [[x for x in a if x[0] == i] for i in (0, 1)]
Upvotes: 6
Reputation: 798606
Adding one line will make the loop more concise at the cost of readability (and FPness).
alist = []
blist = []
bothlists = [alist, blist]
for x in a:
bothlists[x[0]].append(x)
Upvotes: 6