Reputation: 106280
I have some code like:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
The goal is to split up the contents of mylist
into two other lists, based on whether or not they meet a condition.
How can I do this more elegantly? Can I avoid doing two separate iterations over mylist
? Can I improve performance by doing so?
Upvotes: 402
Views: 307871
Reputation: 1534
Already quite a few solutions here, but yet another way of doing that would be -
anims = []
images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]
Iterates over the list only once, and looks a bit more pythonic and hence readable to me.
>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file1.bmp', 33L, '.bmp')]
>>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> anims = []
>>> images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]
>>> print '\n'.join([str(anims), str(images)])
[('file2.avi', 999L, '.avi')]
[('file1.jpg', 33L, '.jpg'), ('file1.bmp', 33L, '.bmp')]
>>>
Upvotes: 0
Reputation: 4449
import functools
def partition(pred, seq):
return functools.reduce(
lambda yes_no, x: (yes_no[0]+[x], yes_no[1]) if pred(x) else (yes_no[0], yes_no[1]+[x]), seq, ([], [])
)
example
files = [ ('file1.jpg', '33L', '.jpg'), ('file2.avi', '999L', '.avi'), ('file1.bmp', '33L', '.bmp')]
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
images, anims = partition(lambda f: f[2].lower() in IMAGE_TYPES, files)
print('\n'.join([str(anims), str(images)]))
# => [('file2.avi', '999L', '.avi')]
# => [('file1.jpg', '33L', '.jpg'), ('file1.bmp', '33L', '.bmp')]
Upvotes: 0
Reputation: 1499
Yet another answer, short but "evil" (for list-comprehension side effects).
digits = list(range(10))
odd = [digits.pop(i) for i, x in enumerate(digits) if x % 2]
>>> odd
[1, 3, 5, 7, 9]
>>> digits
[0, 2, 4, 6, 8]
Upvotes: 1
Reputation: 169563
good = [x for x in mylist if x in goodvals] bad = [x for x in mylist if x not in goodvals]
is there a more elegant way to do this?
That code is perfectly readable, and extremely clear!
# files looks like: [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ... ]
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims = [f for f in files if f[2].lower() not in IMAGE_TYPES]
Again, this is fine!
There might be slight performance improvements using sets, but it's a trivial difference, and I find the list comprehension far easier to read, and you don't have to worry about the order being messed up, duplicates being removed as so on.
In fact, I may go another step "backward", and just use a simple for loop:
images, anims = [], []
for f in files:
if f.lower() in IMAGE_TYPES:
images.append(f)
else:
anims.append(f)
The a list-comprehension or using set()
is fine until you need to add some other check or another bit of logic - say you want to remove all 0-byte jpeg's, you just add something like..
if f[1] == 0:
continue
Upvotes: 156
Reputation: 304147
Iterate manually, using the condition to select a list to which each element will be appended:
good, bad = [], []
for x in mylist:
(bad, good)[x in goodvals].append(x)
Upvotes: 334
Reputation: 26900
Simple generator version, holds the least amount of values possible in memory and calls pred
only once:
from collections import deque
from typing import Callable, TypeVar, Iterable
_T = TypeVar('_T')
def iter_split(pred: Callable[[_T], bool],
iterable: Iterable[_T]) -> tuple[Iterable[_T], Iterable[_T]]:
"""Split an iterable into two iterables based on a predicate.
The predicate will only be called once per element.
Returns:
A tuple of two iterables, the first containing all elements for which
the predicate returned True, the second containing all elements for
which the predicate returned False.
"""
iterator = iter(iterable)
true_values: deque[_T] = deque()
false_values: deque[_T] = deque()
def true_generator():
while True:
while true_values:
yield true_values.popleft()
for item in iterator:
if pred(item):
yield item
break
false_values.append(item)
else:
break
def false_generator():
while True:
while false_values:
yield false_values.popleft()
for item in iterator:
if not pred(item):
yield item
break
true_values.append(item)
else:
break
return true_generator(), false_generator()
Upvotes: 0
Reputation: 7514
Inspired by DanSalmo's comment, here is a solution that is concise, elegant, and at the same time is one of the fastest solutions.
good_set = set(goodvals)
good, bad = [], []
for item in my_list:
good.append(item) if item in good_set else bad.append(item)
Tip: Turning goodvals
into a set gives us an easy speed boost.
For maximum speed, we take the fastest answer and turbocharge it by turning good_list into a set. That alone gives us a 40%+ speed boost, and we end up with a solution that is more than 5.5x as fast as the slowest solution, even while it remains readable.
good_list_set = set(good_list) # 40%+ faster than a tuple.
good, bad = [], []
for item in my_origin_list:
if item in good_list_set:
good.append(item)
else:
bad.append(item)
This is a more concise version of the previous answer.
good_list_set = set(good_list) # 40%+ faster than a tuple.
good, bad = [], []
for item in my_origin_list:
out = good if item in good_list_set else bad
out.append(item)
Elegance can be somewhat subjective, but some of the Rube Goldberg style solutions that are cute and ingenious are quite concerning and should not be used in production code in any language, let alone python which is elegant at heart.
Benchmark results:
filter_BJHomer 80/s -- -3265% -5312% -5900% -6262% -7273% -7363% -8051% -8162% -8244%
zip_Funky 118/s 4848% -- -3040% -3913% -4450% -5951% -6085% -7106% -7271% -7393%
two_lst_tuple_JohnLaRoy 170/s 11332% 4367% -- -1254% -2026% -4182% -4375% -5842% -6079% -6254%
if_else_DBR 195/s 14392% 6428% 1434% -- -882% -3348% -3568% -5246% -5516% -5717%
two_lst_compr_Parand 213/s 16750% 8016% 2540% 967% -- -2705% -2946% -4786% -5083% -5303%
if_else_1_line_DanSalmo 292/s 26668% 14696% 7189% 5033% 3707% -- -331% -2853% -3260% -3562%
tuple_if_else 302/s 27923% 15542% 7778% 5548% 4177% 343% -- -2609% -3029% -3341%
set_1_line 409/s 41308% 24556% 14053% 11035% 9181% 3993% 3529% -- -569% -991%
set_shorter 434/s 44401% 26640% 15503% 12303% 10337% 4836% 4345% 603% -- -448%
set_if_else 454/s 46952% 28358% 16699% 13349% 11290% 5532% 5018% 1100% 469% --
The full benchmark code for Python 3.7 (modified from FunkySayu):
good_list = ['.jpg','.jpeg','.gif','.bmp','.png']
import random
import string
my_origin_list = []
for i in range(10000):
fname = ''.join(random.choice(string.ascii_lowercase) for i in range(random.randrange(10)))
if random.getrandbits(1):
fext = random.choice(list(good_list))
else:
fext = "." + ''.join(random.choice(string.ascii_lowercase) for i in range(3))
my_origin_list.append((fname + fext, random.randrange(1000), fext))
# Parand
def two_lst_compr_Parand(*_):
return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]
# dbr
def if_else_DBR(*_):
a, b = list(), list()
for e in my_origin_list:
if e[2] in good_list:
a.append(e)
else:
b.append(e)
return a, b
# John La Rooy
def two_lst_tuple_JohnLaRoy(*_):
a, b = list(), list()
for e in my_origin_list:
(b, a)[e[2] in good_list].append(e)
return a, b
# # Ants Aasma
# def f4():
# l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
# return [i for p, i in l1 if p], [i for p, i in l2 if not p]
# My personal way to do
def zip_Funky(*_):
a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
return list(filter(None, a)), list(filter(None, b))
# BJ Homer
def filter_BJHomer(*_):
return list(filter(lambda e: e[2] in good_list, my_origin_list)), list(filter(lambda e: not e[2] in good_list, my_origin_list))
# ChaimG's answer; as a list.
def if_else_1_line_DanSalmo(*_):
good, bad = [], []
for e in my_origin_list:
_ = good.append(e) if e[2] in good_list else bad.append(e)
return good, bad
# ChaimG's answer; as a set.
def set_1_line(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
_ = good.append(e) if e[2] in good_list_set else bad.append(e)
return good, bad
# ChaimG set and if else list.
def set_shorter(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
out = good if e[2] in good_list_set else bad
out.append(e)
return good, bad
# ChaimG's best answer; if else as a set.
def set_if_else(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
if e[2] in good_list_set:
good.append(e)
else:
bad.append(e)
return good, bad
# ChaimG's best answer; if else as a set.
def tuple_if_else(*_):
good_list_tuple = tuple(good_list)
good, bad = [], []
for e in my_origin_list:
if e[2] in good_list_tuple:
good.append(e)
else:
bad.append(e)
return good, bad
def cmpthese(n=0, functions=None):
results = {}
for func_name in functions:
args = ['%s(range(256))' % func_name, 'from __main__ import %s' % func_name]
t = Timer(*args)
results[func_name] = 1 / (t.timeit(number=n) / n) # passes/sec
functions_sorted = sorted(functions, key=results.__getitem__)
for f in functions_sorted:
diff = []
for func in functions_sorted:
if func == f:
diff.append("--")
else:
diff.append(f"{results[f]/results[func]*100 - 100:5.0%}")
diffs = " ".join(f'{x:>8s}' for x in diff)
print(f"{f:27s} \t{results[f]:,.0f}/s {diffs}")
if __name__=='__main__':
from timeit import Timer
cmpthese(1000, 'two_lst_compr_Parand if_else_DBR two_lst_tuple_JohnLaRoy zip_Funky filter_BJHomer if_else_1_line_DanSalmo set_1_line set_if_else tuple_if_else set_shorter'.split(" "))
Upvotes: 17
Reputation: 693
This question has many answers already, but all seem inferior to my favorite approach to this problem, which iterates through and tests each item just once, and uses the speed of list-comprehension to build one of the two output lists, so that it only has to use the comparatively slow append
to build one of the output lists:
bad = []
good = [x for x in mylist if x in goodvals or bad.append(x)]
In my answer to a similar question, I explain how this approach works (a combination of Python's greedy evaluation of or
refraining from executing the append
for "good" items, and append
returning a false-like value which leaves the if
condition false for "bad" items), and I show timeit
results indicating that this approach outcompetes alternatives like those suggested here, especially in cases where the majority of items will go into the list built by list-comprehension (in this case, the good
list).
Upvotes: -1
Reputation: 156
I turned to numpy
to solve this to limit lines and make into a simple function.
I was able to get a conditional met, separating a list into two, using np.where
to separate out a list. This works for numeric, but this could be expanded using strings and lists I believe.
Here it is...
from numpy import where as wh, array as arr
midz = lambda a, mid: (a[wh(a > mid)], a[wh((a =< mid))])
p_ = arr([i for i in [75, 50, 403, 453, 0, 25, 428] if i])
high,low = midz(p_, p_.mean())
Upvotes: 0
Reputation: 7514
This list comprehension is simple to read and fast. Exactly what the OP asked for.
set_good_vals = set(good_vals) # Speed boost.
good = [x for x in my_list if x in set_good_vals]
bad = [x for x in my_list if x not in set_good_vals]
I would prefer a single list comprehension rather than two, but unlike many of the answers posted (some of which are quite ingenious) it is readable and clear. It is also one of the fastest answers on the page.
The only answer that is [slightly] faster is:
set_good_vals = set(good_vals)
good, bad = [], []
for item in my_list:
_ = good.append(item) if item in set_good_vals else bad.append(item)
...and variations of this. (See my other answer). But I find the first way more elegant and it's almost as fast.
Upvotes: 0
Reputation: 671
A generator based version, if you can put up with one or maybe two reversals of the original list.
A setup...
random.seed(1234)
a = list(range(10))
random.shuffle(a)
a
[2, 8, 3, 5, 6, 4, 9, 0, 1, 7]
And the split...
(list((a.pop(j) for j, y in [(len(a)-i-1, x) for i,x in enumerate(a[::-1])] if y%2 == 0))[::-1], a)
([2, 8, 6, 4, 0], [3, 5, 9, 1, 7])
Upvotes: 0
Reputation: 353
The previous answers don't seem to satisfy all four of my obsessive compulsive ticks:
My solution isn't pretty, and I don't think I can recommend using it, but here it is:
def iter_split_on_predicate(predicate: Callable[[T], bool], iterable: Iterable[T]) -> Tuple[Iterator[T], Iterator[T]]:
deque_predicate_true = deque()
deque_predicate_false = deque()
# define a generator function to consume the input iterable
# the Predicate is evaluated once per item, added to the appropriate deque, and the predicate result it yielded
def shared_generator(definitely_an_iterator):
for item in definitely_an_iterator:
print("Evaluate predicate.")
if predicate(item):
deque_predicate_true.appendleft(item)
yield True
else:
deque_predicate_false.appendleft(item)
yield False
# consume input iterable only once,
# converting to an iterator with the iter() function if necessary. Probably this conversion is unnecessary
shared_gen = shared_generator(
iterable if isinstance(iterable, collections.abc.Iterator) else iter(iterable)
)
# define a generator function for each predicate outcome and queue
def iter_for(predicate_value, hold_queue):
def consume_shared_generator_until_hold_queue_contains_something():
if not hold_queue:
try:
while next(shared_gen) != predicate_value:
pass
except:
pass
consume_shared_generator_until_hold_queue_contains_something()
while hold_queue:
print("Yield where predicate is "+str(predicate_value))
yield hold_queue.pop()
consume_shared_generator_until_hold_queue_contains_something()
# return a tuple of two generators
return iter_for(predicate_value=True, hold_queue=deque_predicate_true), iter_for(predicate_value=False, hold_queue=deque_predicate_false)
Testing with the following we get the output below from the print statements:
t,f = iter_split_on_predicate(lambda item:item>=10,[1,2,3,10,11,12,4,5,6,13,14,15])
print(list(zip(t,f)))
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# [(10, 1), (11, 2), (12, 3), (13, 4), (14, 5), (15, 6)]
Upvotes: 0
Reputation: 136
Use Boolean logic to assign data to two arrays
>>> images, anims = [[i for i in files if t ^ (i[2].lower() in IMAGE_TYPES) ] for t in (0, 1)]
>>> images
[('file1.jpg', 33, '.jpg')]
>>> anims
[('file2.avi', 999, '.avi')]
Upvotes: 1
Reputation: 1498
You can do lazy functional programming in Python, like this:
partition = lambda l, c: map(
lambda iii: (i for ii in iii for i in ii),
zip(*(([], [e]) if c(e) else ([e], []) for e in l)))
Functional programming is elegant, but not in Python. See also this example in case you know there are not None
values in your list:
partition = lambda l, c: map(
filter(lambda x: x is not None, l),
zip(*((None, e) if c(e) else (e, None) for e in l)))
Upvotes: 0
Reputation: 400
images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims = [f for f in files if f not in images]
Nice when the condition is longer, such as in your example. The reader doesn't have to figure out the negative condition and whether it captures all other cases.
Upvotes: 0
Reputation: 2615
Problem with all proposed solutions is that it will scan and apply the filtering function twice. I'd make a simple small function like this:
def split_into_two_lists(lst, f):
a = []
b = []
for elem in lst:
if f(elem):
a.append(elem)
else:
b.append(elem)
return a, b
That way you are not processing anything twice and also are not repeating code.
Upvotes: 29
Reputation: 6037
good.append(x) if x in goodvals else bad.append(x)
This elegant and concise answer by @dansalmo showed up buried in the comments, so I'm just reposting it here as an answer so it can get the prominence it deserves, especially for new readers.
Complete example:
good, bad = [], []
for x in my_list:
good.append(x) if x in goodvals else bad.append(x)
Upvotes: 12
Reputation: 571
bad = []
good = [x for x in mylist if x in goodvals or bad.append(x)]
append returns None, so it works.
Upvotes: 10
Reputation: 1065
If the list is made of groups and intermittent separators, you can use:
def split(items, p):
groups = [[]]
for i in items:
if p(i):
groups.append([])
groups[-1].append(i)
return groups
Usage:
split(range(1,11), lambda x: x % 3 == 0)
# gives [[1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
Upvotes: 1
Reputation: 3732
Inspired by @gnibbler's great (but terse!) answer, we can apply that approach to map to multiple partitions:
from collections import defaultdict
def splitter(l, mapper):
"""Split an iterable into multiple partitions generated by a callable mapper."""
results = defaultdict(list)
for x in l:
results[mapper(x)] += [x]
return results
Then splitter
can then be used as follows:
>>> l = [1, 2, 3, 4, 2, 3, 4, 5, 6, 4, 3, 2, 3]
>>> split = splitter(l, lambda x: x % 2 == 0) # partition l into odds and evens
>>> split.items()
>>> [(False, [1, 3, 3, 5, 3, 3]), (True, [2, 4, 2, 4, 6, 4, 2])]
This works for more than two partitions with a more complicated mapping (and on iterators, too):
>>> import math
>>> l = xrange(1, 23)
>>> split = splitter(l, lambda x: int(math.log10(x) * 5))
>>> split.items()
[(0, [1]),
(1, [2]),
(2, [3]),
(3, [4, 5, 6]),
(4, [7, 8, 9]),
(5, [10, 11, 12, 13, 14, 15]),
(6, [16, 17, 18, 19, 20, 21, 22])]
Or using a dictionary to map:
>>> map = {'A': 1, 'X': 2, 'B': 3, 'Y': 1, 'C': 2, 'Z': 3}
>>> l = ['A', 'B', 'C', 'C', 'X', 'Y', 'Z', 'A', 'Z']
>>> split = splitter(l, map.get)
>>> split.items()
(1, ['A', 'Y', 'A']), (2, ['C', 'C', 'X']), (3, ['B', 'Z', 'Z'])]
Upvotes: 2
Reputation: 21
For example, splitting list by even and odd
arr = range(20)
even, odd = reduce(lambda res, next: res[next % 2].append(next) or res, arr, ([], []))
Or in general:
def split(predicate, iterable):
return reduce(lambda res, e: res[predicate(e)].append(e) or res, iterable, ([], []))
Advantages:
Disadvantages
Upvotes: 2
Reputation: 13
Not sure if this is a good approach but it can be done in this way as well
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi')]
images, anims = reduce(lambda (i, a), f: (i + [f], a) if f[2] in IMAGE_TYPES else (i, a + [f]), files, ([], []))
Upvotes: 0
Reputation: 41491
My take on it. I propose a lazy, single-pass, partition
function,
which preserves relative order in the output subsequences.
I assume that the requirements are:
i
)filter
or groupby
)split
libraryMy partition
function (introduced below) and other similar functions
have made it into a small library:
It's installable normally via PyPI:
pip install --user split
To split a list base on condition, use partition
function:
>>> from split import partition
>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi') ]
>>> image_types = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> images, other = partition(lambda f: f[-1] in image_types, files)
>>> list(images)
[('file1.jpg', 33L, '.jpg')]
>>> list(other)
[('file2.avi', 999L, '.avi')]
partition
function explainedInternally we need to build two subsequences at once, so consuming
only one output sequence will force the other one to be computed
too. And we need to keep state between user requests (store processed
but not yet requested elements). To keep state, I use two double-ended
queues (deques
):
from collections import deque
SplitSeq
class takes care of the housekeeping:
class SplitSeq:
def __init__(self, condition, sequence):
self.cond = condition
self.goods = deque([])
self.bads = deque([])
self.seq = iter(sequence)
Magic happens in its .getNext()
method. It is almost like .next()
of the iterators, but allows to specify which kind of element we want
this time. Behind the scene it doesn't discard the rejected elements,
but instead puts them in one of the two queues:
def getNext(self, getGood=True):
if getGood:
these, those, cond = self.goods, self.bads, self.cond
else:
these, those, cond = self.bads, self.goods, lambda x: not self.cond(x)
if these:
return these.popleft()
else:
while 1: # exit on StopIteration
n = self.seq.next()
if cond(n):
return n
else:
those.append(n)
The end user is supposed to use partition
function. It takes a
condition function and a sequence (just like map
or filter
), and
returns two generators. The first generator builds a subsequence of
elements for which the condition holds, the second one builds the
complementary subsequence. Iterators and generators allow for lazy
splitting of even long or infinite sequences.
def partition(condition, sequence):
cond = condition if condition else bool # evaluate as bool if condition == None
ss = SplitSeq(cond, sequence)
def goods():
while 1:
yield ss.getNext(getGood=True)
def bads():
while 1:
yield ss.getNext(getGood=False)
return goods(), bads()
I chose the test function to be the first argument to facilitate
partial application in the future (similar to how map
and filter
have the test function as the first argument).
Upvotes: 18
Reputation: 152647
If you don't mind using an external library there two I know that nativly implement this operation:
>>> files = [ ('file1.jpg', 33, '.jpg'), ('file2.avi', 999, '.avi')]
>>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
iteration_utilities.partition
:
>>> from iteration_utilities import partition
>>> notimages, images = partition(files, lambda x: x[2].lower() in IMAGE_TYPES)
>>> notimages
[('file2.avi', 999, '.avi')]
>>> images
[('file1.jpg', 33, '.jpg')]
>>> from more_itertools import partition
>>> notimages, images = partition(lambda x: x[2].lower() in IMAGE_TYPES, files)
>>> list(notimages) # returns a generator so you need to explicitly convert to list.
[('file2.avi', 999, '.avi')]
>>> list(images)
[('file1.jpg', 33, '.jpg')]
Upvotes: 5
Reputation: 54882
Here's the lazy iterator approach:
from itertools import tee
def split_on_condition(seq, condition):
l1, l2 = tee((condition(item), item) for item in seq)
return (i for p, i in l1 if p), (i for p, i in l2 if not p)
It evaluates the condition once per item and returns two generators, first yielding values from the sequence where the condition is true, the other where it's false.
Because it's lazy you can use it on any iterator, even an infinite one:
from itertools import count, islice
def is_prime(n):
return n > 1 and all(n % i for i in xrange(2, n))
primes, not_primes = split_on_condition(count(), is_prime)
print("First 10 primes", list(islice(primes, 10)))
print("First 10 non-primes", list(islice(not_primes, 10)))
Usually though the non-lazy list returning approach is better:
def split_on_condition(seq, condition):
a, b = [], []
for item in seq:
(a if condition(item) else b).append(item)
return a, b
Edit: For your more specific usecase of splitting items into different lists by some key, heres a generic function that does that:
DROP_VALUE = lambda _:_
def split_by_key(seq, resultmapping, keyfunc, default=DROP_VALUE):
"""Split a sequence into lists based on a key function.
seq - input sequence
resultmapping - a dictionary that maps from target lists to keys that go to that list
keyfunc - function to calculate the key of an input value
default - the target where items that don't have a corresponding key go, by default they are dropped
"""
result_lists = dict((key, []) for key in resultmapping)
appenders = dict((key, result_lists[target].append) for target, keys in resultmapping.items() for key in keys)
if default is not DROP_VALUE:
result_lists.setdefault(default, [])
default_action = result_lists[default].append
else:
default_action = DROP_VALUE
for item in seq:
appenders.get(keyfunc(item), default_action)(item)
return result_lists
Usage:
def file_extension(f):
return f[2].lower()
split_files = split_by_key(files, {'images': IMAGE_TYPES}, keyfunc=file_extension, default='anims')
print split_files['images']
print split_files['anims']
Upvotes: 127
Reputation: 542
Yet another solution to this problem. I needed a solution that is as fast as possible. That means only one iteration over the list and preferably O(1) for adding data to one of the resulting lists. This is very similar to the solution provided by sastanin, except much shorter:
from collections import deque
def split(iterable, function):
dq_true = deque()
dq_false = deque()
# deque - the fastest way to consume an iterator and append items
deque((
(dq_true if function(item) else dq_false).append(item) for item in iterable
), maxlen=0)
return dq_true, dq_false
Then, you can use the function in the following way:
lower, higher = split([0,1,2,3,4,5,6,7,8,9], lambda x: x < 5)
selected, other = split([0,1,2,3,4,5,6,7,8,9], lambda x: x in {0,4,9})
If you're not fine with the resulting deque
object, you can easily convert it to list
, set
, whatever you like (for example list(lower)
). The conversion is much faster, that construction of the lists directly.
This methods keeps order of the items, as well as any duplicates.
Upvotes: 2
Reputation: 8061
Sometimes, it looks like list comprehension is not the best thing to use !
I made a little test based on the answer people gave to this topic, tested on a random generated list. Here is the generation of the list (there's probably a better way to do, but it's not the point) :
good_list = ('.jpg','.jpeg','.gif','.bmp','.png')
import random
import string
my_origin_list = []
for i in xrange(10000):
fname = ''.join(random.choice(string.lowercase) for i in range(random.randrange(10)))
if random.getrandbits(1):
fext = random.choice(good_list)
else:
fext = "." + ''.join(random.choice(string.lowercase) for i in range(3))
my_origin_list.append((fname + fext, random.randrange(1000), fext))
And here we go
# Parand
def f1():
return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]
# dbr
def f2():
a, b = list(), list()
for e in my_origin_list:
if e[2] in good_list:
a.append(e)
else:
b.append(e)
return a, b
# John La Rooy
def f3():
a, b = list(), list()
for e in my_origin_list:
(b, a)[e[2] in good_list].append(e)
return a, b
# Ants Aasma
def f4():
l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
return [i for p, i in l1 if p], [i for p, i in l2 if not p]
# My personal way to do
def f5():
a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
return list(filter(None, a)), list(filter(None, b))
# BJ Homer
def f6():
return filter(lambda e: e[2] in good_list, my_origin_list), filter(lambda e: not e[2] in good_list, my_origin_list)
Using the cmpthese function, the best result is the dbr answer :
f1 204/s -- -5% -14% -15% -20% -26%
f6 215/s 6% -- -9% -11% -16% -22%
f3 237/s 16% 10% -- -2% -7% -14%
f4 240/s 18% 12% 2% -- -6% -13%
f5 255/s 25% 18% 8% 6% -- -8%
f2 277/s 36% 29% 17% 15% 9% --
Upvotes: 5
Reputation: 1408
I think a generalization of splitting a an iterable based on N conditions is handy
from collections import OrderedDict
def partition(iterable,*conditions):
'''Returns a list with the elements that satisfy each of condition.
Conditions are assumed to be exclusive'''
d= OrderedDict((i,list())for i in range(len(conditions)))
for e in iterable:
for i,condition in enumerate(conditions):
if condition(e):
d[i].append(e)
break
return d.values()
For instance:
ints,floats,other = partition([2, 3.14, 1, 1.69, [], None],
lambda x: isinstance(x, int),
lambda x: isinstance(x, float),
lambda x: True)
print " ints: {}\n floats:{}\n other:{}".format(ints,floats,other)
ints: [2, 1]
floats:[3.14, 1.69]
other:[[], None]
If the element may satisfy multiple conditions, remove the break.
Upvotes: 4
Reputation: 11
from itertools import tee
def unpack_args(fn):
return lambda t: fn(*t)
def separate(fn, lx):
return map(
unpack_args(
lambda i, ly: filter(
lambda el: bool(i) == fn(el),
ly)),
enumerate(tee(lx, 2)))
[even, odd] = separate(
lambda x: bool(x % 2),
[1, 2, 3, 4, 5])
print(list(even) == [2, 4])
print(list(odd) == [1, 3, 5])
Upvotes: 1
Reputation: 47061
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
Check this
Upvotes: 4