Reputation: 9709
In R, you could split a vector according to the factors of another vector:
> a <- 1:10
[1] 1 2 3 4 5 6 7 8 9 10
> b <- rep(1:2,5)
[1] 1 2 1 2 1 2 1 2 1 2
> split(a,b)
$`1`
[1] 1 3 5 7 9
$`2`
[1] 2 4 6 8 10
Thus, grouping a list (in terms of python) according to the values of another list (according to the order of the factors).
Is there anything handy in python like that, except from the itertools.groupby
approach?
Upvotes: 6
Views: 4076
Reputation: 40628
As a long time R user I was wondering how to do the same thing. It's a very handy function for tabulating vectors. This is what I came up with:
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]
from collections import defaultdict
def split(x, f):
res = defaultdict(list)
for v, k in zip(x, f):
res[k].append(v)
return res
>>> split(a, b)
defaultdict(list, {1: [1, 3, 5, 7, 9], 2: [2, 4, 6, 8, 10]})
Upvotes: 1
Reputation: 44614
Edit: warning, this a groupby
solution, which is not what OP asked for, but it may be of use to someone looking for a less specific way to split the R way in Python.
Here's one way with itertools
.
import itertools
# make your sample data
a = range(1,11)
b = zip(*zip(range(len(a)), itertools.cycle((1,2))))[1]
{k: zip(*g)[1] for k, g in itertools.groupby(sorted(zip(b,a)), lambda x: x[0])}
# {1: (1, 3, 5, 7, 9), 2: (2, 4, 6, 8, 10)}
This gives you a dictionary, which is analogous to the named list that you get from R's split
.
Upvotes: 1
Reputation: 1095
You could try:
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]
split_1 = [a[k] for k in (i for i,j in enumerate(b) if j == 1)]
split_2 = [a[k] for k in (i for i,j in enumerate(b) if j == 2)]
results in:
In [22]: split_1
Out[22]: [1, 3, 5, 7, 9]
In [24]: split_2
Out[24]: [2, 4, 6, 8, 10]
To make this generalise you can simply iterate over the unique elements in b:
splits = {}
for index in set(b):
splits[index] = [a[k] for k in (i for i,j in enumerate(b) if j == index)]
Upvotes: 0
Reputation: 1381
From your example, it looks like each element in b contains the 1-indexed list in which the node will be stored. Python lacks the automatic numeric variables that R seems to have, so we'll return a tuple of lists. If you can do zero-indexed lists, and you only need two lists (i.e., for your R use case, 1 and 2 are the only values, in python they'll be 0 and 1)
>>> a = range(1, 11)
>>> b = [0,1] * 5
>>> split(a, b)
([1, 3, 5, 7, 9], [2, 4, 6, 8, 10])
Then you can use itertools.compress
:
def split(x, f):
return list(itertools.compress(x, f)), list(itertools.compress(x, (not i for i in f)))
If you need more general input (multiple numbers), something like the following will return an n-tuple:
def split(x, f):
count = max(f) + 1
return tuple( list(itertools.compress(x, (el == i for el in f))) for i in xrange(count) )
>>> split([1,2,3,4,5,6,7,8,9,10], [0,1,1,0,2,3,4,0,1,2])
([1, 4, 8], [2, 3, 9], [5, 10], [6], [7])
Upvotes: 6