2D data binning with overlapping in python

Question

I have data in XYZ type. For example:

x = numpy.arange(100)
y = numpy.arange(100)
Z = numpy.random.random_sample((100,))

I would like to bin data, for example, with overlap lengths of dx = 2 and dy = 2. What I did is:

nx = len(x)
ny = len(y)
bin_data = np.zeros((nx, ny))
For i in range(nx):
    For j in range(ny):
        For a, b, c in zip(x,y,z):
            if (x[i] < a) and (a < x[i] + dx):
                if (y[j] < b) and (b < y[j] + dy):
                    bin_data[i,j] += c

For these small data program runs well. However it takes me too much time if the data are big. Can you please recommend any faster algorithm to bin data with overlapping in python. I know numpy.histogram2d is quite fast, but it does not work for overlapping binning.

Serge Ballesta · Accepted Answer

I thing you could easily make your algorythm faster by moving the zip outside of the 2 other loops, as IMHO it is the longest operation :

for a, b, c in zip(x,y,z):
    for i in range(nx):
        for j in range(ny):
            ...

Then, in your exemple, you could make use of x[i] == i and y[j] == j (I add +1 because you have strict <) :

for a, b, c in zip(x,y,z):
    for i in range(a - dx + 1, a):
        for j in range(b - dy + 1, b):
            bin_data[i,j] += c

In fact, you can do that second optimisation as soon as x = f(i) and y = g(i) with f and g being monotonic and easily reversible giving i = f^-1(x) and j = g^-1(y)

2D data binning with overlapping in python

Answers (1)

Related Questions