S L
S L

Reputation: 14318

Mapping two list without looping

I have two lists of equal length. The first list l1 contains data.

l1 = [2, 3, 5, 7, 8, 10, ... , 23]

The second list l2 contains the category the data in l1 belongs to:

l2 = [1, 1, 2, 1, 3, 4, ... , 3]

How can I partition the first list based on the positions defined by numbers such as 1, 2, 3, 4 in the second list, using a list comprehension or lambda function. For example, 2, 3, 7 from the first list belongs to the same partition as they have corresponding values in the second list.

The number of partitions is known at the beginning.

Upvotes: 4

Views: 298

Answers (7)

jDo
jDo

Reputation: 4010

This is not a list comprehension but a dictionary comprehension. It resembles @cromod's solution but preserves the "categories" from l2:

{k:[val for i, val in enumerate(l1) if k == l2[i]] for k in set(l2)}

Output:

>>> l1
[2, 3, 5, 7, 8, 10, 23]
>>> l2
[1, 1, 2, 1, 3, 4, 3]
>>> {k:[val for i, val in enumerate(l1) if k == l2[i]] for k in set(l2)}
{1: [2, 3, 7], 2: [5], 3: [8, 23], 4: [10]}
>>> 

Upvotes: 1

user1556435
user1556435

Reputation: 1056

Using some itertools and operator goodies and a sort you can do this in a one liner:

>>> l1 = [2, 3, 5, 7, 8, 10, 23] 
>>> l2 = [1, 1, 2, 1, 3, 4, 3]
>>> itertools.groupby(sorted(zip(l2, l1)), operator.itemgetter(0))

The result of this is a itertools.groupby object that can be iterated over:

>>> for g, li in itertools.groupby(sorted(zip(l2, l1)), operator.itemgetter(0)):
>>>     print(g, list(map(operator.itemgetter(1), li)))

1 [2, 3, 7]
2 [5]
3 [8, 23]
4 [10]

Upvotes: 1

gboffi
gboffi

Reputation: 25023

If it is reasonable to have your data stored in numpy's ndarrays you can use extended indexing

{i:l1[l2==i] for i in set(l2)}

to construct a dictionary of ndarrays indexed by category code.

There is an overhead associated with l2==i (i.e., building a new Boolean array for each category) that grows with the number of categories, so that you may want to check which alternative, either numpy or defaultdict, is faster with your data.

I tested with n=200000, nc=20 and numpy was faster than defaultdict + izip (124 vs 165 ms) but with nc=10000 numpy was (much) slower (11300 vs 251 ms)

Upvotes: 1

nino_701
nino_701

Reputation: 692

A nested list comprehension :

[ [ l1[j] for j in range(len(l1)) if l2[j] == i ] for i in range(1, max(l2)+1 )]

Upvotes: 1

cromod
cromod

Reputation: 1809

This will give a list of partitions using list comprehension :

>>> l1 = [2, 3, 5, 7, 8, 10, 23] 
>>> l2 = [1, 1, 2, 1, 3, 4, 3]
>>> [[value for i, value in enumerate(l1) if j == l2[i]] for j in set(l2)]
[[2, 3, 7], [5], [8, 23], [10]]

Upvotes: 2

timgeb
timgeb

Reputation: 78690

If a dict is fine, I suggest using a defaultdict:

>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for number, category in zip(l1, l2):
...     d[category].append(number)
... 
>>> d
defaultdict(<type 'list'>, {1: [2, 3, 7], 2: [5], 3: [8, 23], 4: [10]})

Consider using itertools.izip for memory efficiency if you are using Python 2.

This is basically the same solution as Kasramvd's, but I think the defaultdict makes it a little easier to read.

Upvotes: 7

Kasravnd
Kasravnd

Reputation: 107287

You can use a dictionary:

>>> l1 = [2, 3, 5, 7, 8, 10, 23] 
>>> l2 = [1, 1, 2, 1, 3, 4, 3]

>>> d = {}
>>> for i, j in zip(l1, l2):
...     d.setdefault(j, []).append(i)
... 
>>> 
>>> d
{1: [2, 3, 7], 2: [5], 3: [8, 23], 4: [10]}

Upvotes: 8

Related Questions