dorvak
dorvak

Reputation: 9709

Python equivalent of R "split"-function

In R, you could split a vector according to the factors of another vector:

> a <- 1:10
  [1]  1  2  3  4  5  6  7  8  9 10
> b <- rep(1:2,5)
  [1] 1 2 1 2 1 2 1 2 1 2

> split(a,b)

   $`1`
   [1] 1 3 5 7 9
   $`2`
   [1]  2  4  6  8 10

Thus, grouping a list (in terms of python) according to the values of another list (according to the order of the factors).

Is there anything handy in python like that, except from the itertools.groupby approach?

Upvotes: 6

Views: 4076

Answers (4)

Zelazny7
Zelazny7

Reputation: 40628

As a long time R user I was wondering how to do the same thing. It's a very handy function for tabulating vectors. This is what I came up with:

a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]

from collections import defaultdict
def split(x, f):
    res = defaultdict(list)
    for v, k in zip(x, f):
        res[k].append(v)
    return res

>>> split(a, b)
defaultdict(list, {1: [1, 3, 5, 7, 9], 2: [2, 4, 6, 8, 10]})

Upvotes: 1

Matthew Plourde
Matthew Plourde

Reputation: 44614

Edit: warning, this a groupby solution, which is not what OP asked for, but it may be of use to someone looking for a less specific way to split the R way in Python.


Here's one way with itertools.

import itertools
# make your sample data
a = range(1,11)
b = zip(*zip(range(len(a)), itertools.cycle((1,2))))[1]

{k: zip(*g)[1] for k, g in itertools.groupby(sorted(zip(b,a)), lambda x: x[0])}
# {1: (1, 3, 5, 7, 9), 2: (2, 4, 6, 8, 10)}

This gives you a dictionary, which is analogous to the named list that you get from R's split.

Upvotes: 1

RMcG
RMcG

Reputation: 1095

You could try:

a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]

split_1 = [a[k] for k in (i for i,j in enumerate(b) if j == 1)]
split_2 = [a[k] for k in (i for i,j in enumerate(b) if j == 2)]

results in:

In [22]: split_1
Out[22]: [1, 3, 5, 7, 9]

In [24]: split_2
Out[24]: [2, 4, 6, 8, 10]

To make this generalise you can simply iterate over the unique elements in b:

splits = {}
for index in set(b):
   splits[index] =  [a[k] for k in (i for i,j in enumerate(b) if j == index)]

Upvotes: 0

John Spong
John Spong

Reputation: 1381

From your example, it looks like each element in b contains the 1-indexed list in which the node will be stored. Python lacks the automatic numeric variables that R seems to have, so we'll return a tuple of lists. If you can do zero-indexed lists, and you only need two lists (i.e., for your R use case, 1 and 2 are the only values, in python they'll be 0 and 1)

>>> a = range(1, 11)
>>> b = [0,1] * 5

>>> split(a, b)
([1, 3, 5, 7, 9], [2, 4, 6, 8, 10])

Then you can use itertools.compress:

def split(x, f):
    return list(itertools.compress(x, f)), list(itertools.compress(x, (not i for i in f)))

If you need more general input (multiple numbers), something like the following will return an n-tuple:

def split(x, f):
    count = max(f) + 1
    return tuple( list(itertools.compress(x, (el == i for el in f))) for i in xrange(count) )  

>>> split([1,2,3,4,5,6,7,8,9,10], [0,1,1,0,2,3,4,0,1,2])
([1, 4, 8], [2, 3, 9], [5, 10], [6], [7])

Upvotes: 6

Related Questions