Max
Max

Reputation: 8045

How to split a list on a condition?

By now I didn't find a convenient way to split a list by certain conditions, for example, I have a record list:

 a = ((0,1),(1,0),(0,2),(1,0),(3,0),(4,0),(0,3),(1,5)....)

I want to split the content into 2 lists

alist = []
blist = []
for x in a:
    if x[0] == 0:
        alist.append(x)
    elif x[0] == 1:
        blist.append(x)

Not very concise.

Written as list comprehensions:

aList = [x for x in a if x[0] == 0]
bList = [x for x in a if x[0] == 1]

List comprehensions are usually good for reading and performance, but in this case the list must be iterated twice.

Is there a better way to do this job?

Upvotes: 5

Views: 4919

Answers (6)

JustinFisher
JustinFisher

Reputation: 693

Here's a way to do this that takes advantage of the speed of a list comprehension to make one list, while incrementally appending the rejects onto another list.

blist = []
alist = [x for x in data if x[0]==0 or blist.append(x)]

This takes advantage of Python's greedily evaluating the or, so for any x that passes the alist criterion (x[0]==0) Python won't bother evaluating the second disjunct (blist.append(x)). This is a case where it's fairly intuitive to read Python's "or" as meaning "or else do this second thing and use its output instead" (which is technically what it always means).

This also takes advantage of the fact that list.append always returns a false-like None value, so that in the case where some x fails the alist criterion and thus gets appended to blist instead, the or still evaluates as false, so that rejected x still will not be included in the alist comprehension.

Here are timeit results (on my reasonably powerful desktop), each dividing a list of a million random numbers ranging 0-1 into a small list of those less than 0.1 and a large list of those larger, each trial repeated 10 times.

0.80 seconds: Using append to create both (suggested by @Ignacio)
0.68 seconds: Separate list comprehensions (suggested by @Max)
0.68 seconds: Using append to create large, comprehend to create small (inverse of what I would suggest!)
0.52 seconds: Using append to create small, comprehend to create large (suggested by me)

These results confirm what you would probably expect: primarily, list comprehension builds big lists a lot faster than append does, and, secondarily, 1 traversal is faster than 2. If you can predict which list will be longer, better to build that large list by comprehension, while appending just a few rejects into the smaller list.


For anyone who wants to try this themselves, here's the timing code.

import random, timeit
L = [random.random() for i in range(1000000)]

def comprehend_both():  # suggested by @max
    small = [n for n in L if n <= 0.1]
    large = [n for n in L if n > 0.1]

def append_both():   # suggested by @ignacio
    lists = small, large = [[],[]]
    for n in L: lists[n > 0.1].append(n)  # exploits False==0, True==1

def append_small_comprehend_large():  # suggested by @JustinFisher
    small = []
    large = [n for n in L if n > 0.1 or small.append(n)]

def comprehend_small_append_large():  # worse since appending more is slower
    large = []
    small = [n for n in L if n <= 0.1 or large.append(n)]

print( "C/C:", timeit.timeit("comprehend_both()",globals=globals(),number=10)) # 0.68 seconds
print( "A/A:", timeit.timeit("append_both()",globals=globals(),number=10))     # 0.80 seconds
print( "A/C:", timeit.timeit("append_small_comprehend_large()",globals=globals(),number=10)) # returns 0.52
print( "C/A:", timeit.timeit("comprehend_small_append_large()",globals=globals(),number=10)) # returns 0.68

Upvotes: 0

ChaimG
ChaimG

Reputation: 7514

A dict can be more concise.

alist, blist = [], []
for x in a:
    {0: alist, 1: blist}.get(x[0]).append(x)

This is a flexible pattern that you can expand to cover other situations.

For example, you can add more choices. You can also gracefully deal with unexpected values by changing .get(x[0]) to .get(x[0], []).

Upvotes: 0

atomocopter
atomocopter

Reputation: 976

I would use filter, see: http://docs.python.org/2/library/functions.html#filter

>>> a = ((0,1), (1,0), (0,2), (1,0), (3,0), (4,0), (0,3), (1,5))
>>> filter(lambda x: x[0] == 1, a)
((1, 0), (1, 0), (1, 5))

To avoid redundant code you could collect the conditions in an iterable, like so:

>>> fs = (lambda x: x[0] == 0, lambda x: x[0] == 1)
>>> for f in fs:
...     filter(f, a)
...
((0, 1), (0, 2), (0, 3))
((1, 0), (1, 0), (1, 5))

Upvotes: 0

Blender
Blender

Reputation: 298156

If you really want to make things complicated, you can import some functions from itertools to obfuscate your readable solution even more:

from operator import itemgetter
from collections import defaultdict

d = defaultdict(list)

for key, value in itertools.groupby(a, itemgetter(0)):
    d[key].append(list(value))

Here is the output:

>>> print d[0]
[[(0, 1)], [(0, 2)], [(0, 3)]]
>>> print d[1]
[[(1, 0)], [(1, 0)], [(1, 5)]]
>>> print d[4]
[[(4, 0)]]

This code just groups the items into a dictionary using the value in the first tuple as the key. It's a little more generic than your code.

Upvotes: 1

Lev Levitsky
Lev Levitsky

Reputation: 65791

Well, the conditions are different, no wonder you need two loops. But if you want to sacrifice some readability,

aList, bList = [[x for x in a if x[0] == i] for i in (0, 1)]

Upvotes: 6

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798606

Adding one line will make the loop more concise at the cost of readability (and FPness).

alist = []
blist = []
bothlists = [alist, blist]
for x in a:
  bothlists[x[0]].append(x)

Upvotes: 6

Related Questions