Reputation: 2753

How to group rows in a Numpy 2D matrix based on column values?

What would be an efficient (time, easy) way of grouping a 2D NumPy matrix rows by different column conditions (e.g. group by column 2 values) and running f1() and f2() on each of those groups?

Thanks

Upvotes: 5

Answers (3)

Joran Beasley

Reputation: 113988

from operator import itemgetter
sorted(my_numpy_array,key=itemgetter(1))

or maybe something like

from itertools import groupby
from operator import itemgetter
print groupby(my_numpy_array,key = itemgetter(1))

Upvotes: 1

Eelco Hoogendoorn

Reputation: 10759

A compact solution is to use numpy_indexed (disclaimer: I am its author), which implements a fully vectorized solution to this type of problem:

The simplest way to use it is as:

import numpy_indexed as npi
npi.group_by(arr[:, col1]).mean(arr)

But this also works:

# run function f1 on each group, formed by keys which are the rows of arr[:, [col1, col2]
npi.group_by(arr[:, [col1, col2]], arr, f1)

Upvotes: 6

Jaime

Reputation: 67427

If you have an array arr of shape (rows, cols), you can get the vector of all values in column 2 as

col = arr[:, 2]

You can then construct a boolean array with your grouping condition, say group 1 is made up of those rows with have a value larger than 5 in column 2:

idx = col > 5

You can apply this boolean array directly to your original array to select rows:

group_1 = arr[idx]
group_2 = arr[~idx]

For example:

>>> arr = np.random.randint(10, size=(6,4))
>>> arr
array([[0, 8, 7, 4],
       [5, 2, 6, 9],
       [9, 5, 7, 5],
       [6, 9, 1, 5],
       [8, 0, 5, 8],
       [8, 2, 0, 6]])
>>> idx = arr[:, 2] > 5
>>> arr[idx]
array([[0, 8, 7, 4],
       [5, 2, 6, 9],
       [9, 5, 7, 5]])
>>> arr[~idx]
array([[6, 9, 1, 5],
       [8, 0, 5, 8],
       [8, 2, 0, 6]])

Upvotes: 10

How to group rows in a Numpy 2D matrix based on column values?

Answers (3)

Related Questions