Reputation: 14318
I have two lists of equal length. The first list l1
contains data.
l1 = [2, 3, 5, 7, 8, 10, ... , 23]
The second list l2
contains the category the data in l1
belongs to:
l2 = [1, 1, 2, 1, 3, 4, ... , 3]
How can I partition the first list based on the positions defined by numbers such as 1, 2, 3, 4
in the second list, using a list comprehension or lambda function. For example, 2, 3, 7
from the first list belongs to the same partition as they have corresponding values in the second list.
The number of partitions is known at the beginning.
Upvotes: 4
Views: 298
Reputation: 4010
This is not a list comprehension but a dictionary comprehension. It resembles @cromod's solution but preserves the "categories" from l2
:
{k:[val for i, val in enumerate(l1) if k == l2[i]] for k in set(l2)}
Output:
>>> l1
[2, 3, 5, 7, 8, 10, 23]
>>> l2
[1, 1, 2, 1, 3, 4, 3]
>>> {k:[val for i, val in enumerate(l1) if k == l2[i]] for k in set(l2)}
{1: [2, 3, 7], 2: [5], 3: [8, 23], 4: [10]}
>>>
Upvotes: 1
Reputation: 1056
Using some itertools
and operator
goodies and a sort you can do this in a one liner:
>>> l1 = [2, 3, 5, 7, 8, 10, 23]
>>> l2 = [1, 1, 2, 1, 3, 4, 3]
>>> itertools.groupby(sorted(zip(l2, l1)), operator.itemgetter(0))
The result of this is a itertools.groupby
object that can be iterated over:
>>> for g, li in itertools.groupby(sorted(zip(l2, l1)), operator.itemgetter(0)):
>>> print(g, list(map(operator.itemgetter(1), li)))
1 [2, 3, 7]
2 [5]
3 [8, 23]
4 [10]
Upvotes: 1
Reputation: 25023
If it is reasonable to have your data stored in numpy
's ndarrays you can use extended indexing
{i:l1[l2==i] for i in set(l2)}
to construct a dictionary of ndarrays indexed by category code.
There is an overhead associated with l2==i
(i.e., building a new Boolean array for each category) that grows with the number of categories, so that you may want to check which alternative, either numpy
or defaultdict
, is faster with your data.
I tested with n=200000
, nc=20
and numpy
was faster than defaultdict
+ izip
(124 vs 165 ms) but with nc=10000
numpy
was (much) slower (11300 vs 251 ms)
Upvotes: 1
Reputation: 692
A nested list comprehension :
[ [ l1[j] for j in range(len(l1)) if l2[j] == i ] for i in range(1, max(l2)+1 )]
Upvotes: 1
Reputation: 1809
This will give a list of partitions using list comprehension :
>>> l1 = [2, 3, 5, 7, 8, 10, 23]
>>> l2 = [1, 1, 2, 1, 3, 4, 3]
>>> [[value for i, value in enumerate(l1) if j == l2[i]] for j in set(l2)]
[[2, 3, 7], [5], [8, 23], [10]]
Upvotes: 2
Reputation: 78690
If a dict
is fine, I suggest using a defaultdict
:
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for number, category in zip(l1, l2):
... d[category].append(number)
...
>>> d
defaultdict(<type 'list'>, {1: [2, 3, 7], 2: [5], 3: [8, 23], 4: [10]})
Consider using itertools.izip
for memory efficiency if you are using Python 2.
This is basically the same solution as Kasramvd's, but I think the defaultdict
makes it a little easier to read.
Upvotes: 7
Reputation: 107287
You can use a dictionary:
>>> l1 = [2, 3, 5, 7, 8, 10, 23]
>>> l2 = [1, 1, 2, 1, 3, 4, 3]
>>> d = {}
>>> for i, j in zip(l1, l2):
... d.setdefault(j, []).append(i)
...
>>>
>>> d
{1: [2, 3, 7], 2: [5], 3: [8, 23], 4: [10]}
Upvotes: 8