hoang tran
hoang tran

Reputation: 4018

2D data binning with overlapping in python

I have data in XYZ type. For example:

x = numpy.arange(100)
y = numpy.arange(100)
Z = numpy.random.random_sample((100,))

I would like to bin data, for example, with overlap lengths of dx = 2 and dy = 2. What I did is:

nx = len(x)
ny = len(y)
bin_data = np.zeros((nx, ny))
For i in range(nx):
    For j in range(ny):
        For a, b, c in zip(x,y,z):
            if (x[i] < a) and (a < x[i] + dx):
                if (y[j] < b) and (b < y[j] + dy):
                    bin_data[i,j] += c

For these small data program runs well. However it takes me too much time if the data are big. Can you please recommend any faster algorithm to bin data with overlapping in python. I know numpy.histogram2d is quite fast, but it does not work for overlapping binning.

Upvotes: 0

Views: 735

Answers (1)

Serge Ballesta
Serge Ballesta

Reputation: 148890

I thing you could easily make your algorythm faster by moving the zip outside of the 2 other loops, as IMHO it is the longest operation :

for a, b, c in zip(x,y,z):
    for i in range(nx):
        for j in range(ny):
            ...

Then, in your exemple, you could make use of x[i] == i and y[j] == j (I add +1 because you have strict <) :

for a, b, c in zip(x,y,z):
    for i in range(a - dx + 1, a):
        for j in range(b - dy + 1, b):
            bin_data[i,j] += c

In fact, you can do that second optimisation as soon as x = f(i) and y = g(i) with f and g being monotonic and easily reversible giving i = f-1(x) and j = g-1(y)

Upvotes: 1

Related Questions