johnhenry
johnhenry

Reputation: 1333

Python - Binning x,y,z values on a 2D grid

I have a list of z points associated to pairs x,y, meaning that for example

x     y                    z
3.1   5.2                  1.3    
4.2   2.3                  9.3
5.6   9.8                  3.5

and so on. The total number of z values is relatively high, around 10000. I would like to bin my data, in the following sense:

1) I would like to split the x and y values into cells, so as to make a 2-dimensional grid in x,y.If I have Nx cells for the x axis and Ny for the y axis, I would then have Nx*Ny cells on the grid. For example, the first bin for x could be ranging from 1. to 2., the second from 2. to 3. and so on.

2) For each of this cell in the 2dimensional grid, I would then need to calculate how many points fall into that cell, and sum all their z values. This gives me a numerical value associated to each cell.

I thought about using binned_statistic from scipy.stats, but I would have no idea on how to set the options to accomplish my task. Any suggestions? Also other tools, other than binned_statistic, are well accepted.

Upvotes: 1

Views: 2186

Answers (2)

wwii
wwii

Reputation: 23743

Establish the edges of the cells, iterate over cell edges and use boolean indexing to extract the z values in each cell, keep the sums in a list, convert the list and reshape it.

import itertools
import numpy as np
x = np.array([0.1, 0.1, 0.1, 0.6, 1.2, 2.1])
y = np.array([2.1, 2.6, 2.1, 2.1, 3.4, 4.7])
z = np.array([2., 3., 5., 7., 10, 20])


def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.izip(a, b)

minx, maxx = int(min(x)), int(max(x)) + 1
miny, maxy = int(min(y)), int(max(y)) + 1

result = []
x_edges = pairwise(xrange(minx, maxx + 1))
for xleft, xright in x_edges:
    xmask = np.logical_and(x >= xleft, x < xright)
    y_edges = pairwise(xrange(miny, maxy + 1))
    for yleft, yright in y_edges:
        ymask = np.logical_and(y >= yleft, y < yright)
        cell = z[np.logical_and(xmask, ymask)]
        result.append(cell.sum())

result = np.array(result).reshape((maxx - minx, maxy - miny))


>>> result
array([[ 17.,   0.,   0.],
       [  0.,  10.,   0.],
       [  0.,   0.,  20.]])
>>> 

Unfortunately, no numpy vectorization magic

Upvotes: 1

Bill Bell
Bill Bell

Reputation: 21643

Assuming I understand, you can get what you need by exploiting the expand_binnumbers parameter for binned_statistic_2d, thus.

from scipy.stats import binned_statistic_2d
import numpy as np

x = [0.1, 0.1, 0.1, 0.6]
y = [2.1, 2.6, 2.1, 2.1]
z = [2.,3.,5.,7.]
binx = [0.0, 0.5, 1.0]
biny = [2.0, 2.5, 3.0]

ret = binned_statistic_2d(x, y, None, 'count', bins=[binx,biny], \
    expand_binnumbers=True)

print (ret.statistic)

print (ret.binnumber)

sums = np.zeros([-1+len(binx), -1+len(biny)])

for i in range(len(x)):
    m = ret.binnumber [0][i] - 1
    n = ret.binnumber [1][i] - 1
    sums[m][n] += sums[m][n] + z[i]

print (sums)

This is just an expansion of one of the examples. Here's the output.

[[ 2.  1.]
 [ 1.  0.]]
[[1 1 1 2]
 [1 2 1 1]]
[[ 9.  3.]
 [ 7.  0.]]

Upvotes: 1

Related Questions