Gabriel
Gabriel

Reputation: 42329

Histogram in N dimensions with numpy

I'm trying to generate two 2D histograms using numpy.histogramdd (I'm aware of histogram2d, but I need this one for scalability to N dimensions eventually)

Both histograms should use the same range, so I define it before obtaining them.

The issue is that I can't get my code to work, I get either a ValueError: too many values to unpack or a ValueError: sequence too large; must be smaller than 32 error using different configurations.

Here's the MWE:

import numpy as np

def rand_data(N):
    return np.random.uniform(low=1., high=2000., size=(N,))

# Some random 2D data.
N = 100
P = [rand_data(N), rand_data(N)]
Q = [rand_data(N), rand_data(N)]

# Number of bins.
b = np.sqrt(len(P[0])) * 2

# Max and min values for x and y
x_min = np.sort(np.minimum(P[0], Q[0]))[0]
x_max = np.sort(np.minimum(P[0], Q[0]))[-1]
y_min = np.sort(np.minimum(P[1], Q[1]))[0]
y_max = np.sort(np.minimum(P[1], Q[1]))[-1]
# Range for the histograms.
rang = [np.linspace(x_min, x_max, b), np.linspace(y_min, y_max, b)]

# Histograms
d_1 = np.histogramdd(zip(*[P[0], P[1]]), range=rang)[0]
d_2 = np.histogramdd(zip(*[Q[0], Q[1]]), range=rang)[0]

What am I doing wrong?

Upvotes: 1

Views: 2300

Answers (1)

cel
cel

Reputation: 31349

The following code should work for you. There are two issues: The edges of your bins are passed to the bins argument, not to the range argument. Besides, passing a list of tuples does not seem to work. If you convert those tuples to a numpy array and pass the array it should work as expected.

This code works for me:

import numpy as np

def rand_data(N):
    return np.random.uniform(low=1., high=2000., size=(N,))

# Some random 2D data.
N = 100
P = [rand_data(N), rand_data(N)]
Q = [rand_data(N), rand_data(N)]

# Number of bins.
b = np.sqrt(len(P[0])) * 2

# Max and min values for x and y
x_min = np.sort(np.minimum(P[0], Q[0]))[0]
x_max = np.sort(np.minimum(P[0], Q[0]))[-1]
y_min = np.sort(np.minimum(P[1], Q[1]))[0]
y_max = np.sort(np.minimum(P[1], Q[1]))[-1]
# Range for the histograms.
rang = [np.linspace(x_min, x_max, b), np.linspace(y_min, y_max, b)]

# Histograms
sample1 = np.array(list(zip(*[P[0], P[1]])))
sample2 = np.array(list(zip(*[Q[0], Q[1]])))
d_1 = np.histogramdd(sample1, bins=rang)[0]
d_2 = np.histogramdd(sample2, bins=rang)[0]

Upvotes: 1

Related Questions