Rich
Rich

Reputation: 3821

python map/reduce: emit multiple keys values from single map lambda

Is there a canonical way to emit multiple keys from a single item in the input sequence so that they form a continuous sequence and I don't need to use a reduce(...) just to flatten the sequence?

e.g. if I wanted to expand each digit in a series of numbers into individual numbers in a sequence

[1,12,123,1234,12345] => [1,1,2,1,2,3,1,2,3,4,1,2,3,4,5]

then I'd write some python that looked a bit like this:

somedata = [1,12,123,1234,12345]

listified = map(lambda x:[int(c) for c in str(x)], somedata)
flattened = reduce(lambda x,y: x+y,listified,[])

but would prefer not to have to call the flattened = reduce(...) if there was a neater (or maybe more efficient) way to express this.

Upvotes: 2

Views: 1180

Answers (3)

remykarem
remykarem

Reputation: 2469

Here's how the transformation goes:

  1. Convert every item (int) to a string: 12 -> '12'
  2. Convert every item (str) to a list of string: '12' -> ['1', '2']
  3. Flatten every item (list of str): ['1', '2'] -> '1', '2'
  4. Convert every item (str) to an int: '1' -> 1

We can use Pyterator for this:

from pyterator import iterate

(
    iterate([1, 12, 123, 1234, 12345])
    .flat_map(lambda x: list(str(x)))  # Steps 1-3
    .map(int)  # Step 4
    .to_list()
)

Upvotes: 0

unutbu
unutbu

Reputation: 879251

map(func, *iterables) will always call func as many times as the length of the shortest iterable (assuming no Exception is raised). Functions always return a single object. So list(map(func, *iterables)) will always have the same length as the shortest iterable.

Thus list(map(lambda x:[int(c) for c in str(x)], somedata)) will always have the same length as somedata. There is no way around that.

If the desired result (e.g. [1,1,2,1,2,3,1,2,3,4,1,2,3,4,5]) has more items than the input (e.g. [1,12,123,1234,12345]) then something other than map must be used to produce it.

You could, for example, use itertools.chain.from_iterable to flatten 2 levels of nesting:

In [31]: import itertools as IT

In [32]: somedata = [1,12,123,1234,12345]

In [33]: list(map(int, IT.chain.from_iterable(map(str, somedata))))
Out[33]: [1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5]

or, to flatten a list of lists, sum(..., []) suffices:

In [44]: sum(map(lambda x:[int(c) for c in str(x)], somedata), [])
Out[44]: [1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5]

but note that this is much slower than using IT.chain.from_iterable (see below).


Here is a benchmark (using IPython's %timeit) testing the various methods on a list of 10,000 integers from 0 to a million:

In [4]: import random
In [8]: import functools
In [49]: somedata = [random.randint(0, 10**6) for i in range(10**4)]

In [50]: %timeit list(map(int, IT.chain.from_iterable(map(str, somedata))))
100 loops, best of 3: 9.35 ms per loop

In [13]: %timeit [int(i) for i in list(''.join(str(somedata)[1:-1].replace(', ','')))]
100 loops, best of 3: 12.2 ms per loop

In [52]: %timeit [int(j) for i in somedata for j in str(i)]
100 loops, best of 3: 12.3 ms per loop

In [51]: %timeit sum(map(lambda x:[int(c) for c in str(x)], somedata), [])
1 loop, best of 3: 869 ms per loop

In [9]: %timeit listified = map(lambda x:[int(c) for c in str(x)], somedata); functools.reduce(lambda x,y: x+y,listified,[])
1 loop, best of 3: 871 ms per loop

Upvotes: 3

yourstruly
yourstruly

Reputation: 1002

Got two ideas, one with list comprehentions:

print [int(j) for i in somedata for j in list(str(i)) ]

Something new (from comments), string is already iterable, so it would be:

print [int(j) for i in somedata for j in str(i) ]

second with opertations on strings and list comprehentions:

print [int(i) for i in list(''.join(str(somedata)[1:-1].replace(', ','')))]

output for both:

[1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5]

Upvotes: 2

Related Questions