Reputation: 3821
Is there a canonical way to emit multiple keys from a single item in the input sequence so that they form a continuous sequence and I don't need to use a reduce(...)
just to flatten the sequence?
e.g. if I wanted to expand each digit in a series of numbers into individual numbers in a sequence
[1,12,123,1234,12345] => [1,1,2,1,2,3,1,2,3,4,1,2,3,4,5]
then I'd write some python that looked a bit like this:
somedata = [1,12,123,1234,12345]
listified = map(lambda x:[int(c) for c in str(x)], somedata)
flattened = reduce(lambda x,y: x+y,listified,[])
but would prefer not to have to call the flattened = reduce(...)
if there was a neater (or maybe more efficient) way to express this.
Upvotes: 2
Views: 1180
Reputation: 2469
Here's how the transformation goes:
12 -> '12'
'12' -> ['1', '2']
['1', '2'] -> '1', '2'
'1' -> 1
We can use Pyterator for this:
from pyterator import iterate
(
iterate([1, 12, 123, 1234, 12345])
.flat_map(lambda x: list(str(x))) # Steps 1-3
.map(int) # Step 4
.to_list()
)
Upvotes: 0
Reputation: 879251
map(func, *iterables)
will always call func
as many times as the length of the shortest iterable (assuming no Exception is raised). Functions always return a single object. So list(map(func, *iterables))
will always have the same length as the shortest iterable.
Thus list(map(lambda x:[int(c) for c in str(x)], somedata))
will always have the same length as somedata
. There is no way around that.
If the desired result (e.g. [1,1,2,1,2,3,1,2,3,4,1,2,3,4,5]
) has more items than the input (e.g. [1,12,123,1234,12345]
) then something other than map
must be used to produce it.
You could, for example, use itertools.chain.from_iterable
to flatten 2 levels of nesting:
In [31]: import itertools as IT
In [32]: somedata = [1,12,123,1234,12345]
In [33]: list(map(int, IT.chain.from_iterable(map(str, somedata))))
Out[33]: [1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5]
or, to flatten a list of lists, sum(..., [])
suffices:
In [44]: sum(map(lambda x:[int(c) for c in str(x)], somedata), [])
Out[44]: [1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5]
but note that this is much slower than using IT.chain.from_iterable
(see below).
Here is a benchmark (using IPython's %timeit
) testing the various methods on a list of 10,000 integers from 0 to a million:
In [4]: import random
In [8]: import functools
In [49]: somedata = [random.randint(0, 10**6) for i in range(10**4)]
In [50]: %timeit list(map(int, IT.chain.from_iterable(map(str, somedata))))
100 loops, best of 3: 9.35 ms per loop
In [13]: %timeit [int(i) for i in list(''.join(str(somedata)[1:-1].replace(', ','')))]
100 loops, best of 3: 12.2 ms per loop
In [52]: %timeit [int(j) for i in somedata for j in str(i)]
100 loops, best of 3: 12.3 ms per loop
In [51]: %timeit sum(map(lambda x:[int(c) for c in str(x)], somedata), [])
1 loop, best of 3: 869 ms per loop
In [9]: %timeit listified = map(lambda x:[int(c) for c in str(x)], somedata); functools.reduce(lambda x,y: x+y,listified,[])
1 loop, best of 3: 871 ms per loop
Upvotes: 3
Reputation: 1002
Got two ideas, one with list comprehentions:
print [int(j) for i in somedata for j in list(str(i)) ]
Something new (from comments), string is already iterable, so it would be:
print [int(j) for i in somedata for j in str(i) ]
second with opertations on strings and list comprehentions:
print [int(i) for i in list(''.join(str(somedata)[1:-1].replace(', ','')))]
output for both:
[1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5]
Upvotes: 2