How to sample points from a data set using a grid?

Question

So I have some data with around a million (r, phi) coordinates, along with their intensities. I want to sample this data in a grid pattern so I can reduce memory used, and plot faster. However I want to sample the data in X,Y as I will be converting the coordinates to (X,Y) coordinates to plot them.

I was thinking I could use a meshgrid to come up with a template I'd like to sample, but I'm stuck on the next step.

I can't seem to find anything useful searching on google or here, but apologies if this is too simple a question!

I'm using numpy and my data is stored as three seperate arrays right now. I was planning to use np.meshgrid and later scipy.interpolate.griddata for interpolation.

r, phi and intensity are all np.arrays with shape (million,)

e.g.

r = array([1560.8, 1560.8003119, 1560.8006238, ..., 3556.831746,
           3558.815873 , 3560.8      ])

I started with this;

r = data[:, 0]  # radius
phi = data[:, 1]  # altitude angle
h2o = data[:, 2]  # intensity

x = r * np.sin(phi)  # It's a left handed coordinate system
z = r * np.cos(phi)

And for the sampling grid I have got this;

Xscale = np.linspace(min(x), max(x), 1000)
Zscale = np.linspace(min(z), max(z), 1000)

[X, Z] = np.meshgrid(Xscale, Zscale)

manu190466 · Accepted Answer

It would have been nice if you have provided some data to work on. It doesn't matter, we will create some.

Lets create x,y values from r,theta arbitrary values :

import numpy as np
import matplotlib.pyplot as plt

theta=np.linspace(0.,50.,1000)
r=np.linspace(5.,10,1000)

x=r*np.sin(theta)
y=r*np.cos(theta)

plt.plot(x,y,linestyle='',marker='.')

The plot gives :

Now add arbitrary intensity values :

intensity=np.sqrt(x**2+y**2)

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, intensity)

The scatter plot gives :

If I understand well we should not be far from your starting point. We have now 3 arrays with 1000 values. We are going to reduce it to a 20x20 mesgrid. We have to first create the x and y bins, then call the binned_statistic_2d method from scipy and that's it.

import scipy.stats as stats

binx=np.linspace(-10.,10.,20)
biny=np.linspace(-10.,10.,20)

ret = stats.binned_statistic_2d(x, y, intensity, 'mean', bins=[binx,biny])

Z=ret.statistic
Z = np.ma.masked_invalid(Z) # allow to mask Nan values got in bins where there is no value
X, Y = np.meshgrid(binx,biny)

plt.pcolor(X,Y,Z)
plt.show()

The pcolor plot gives :

As requested in your comment, we can now go back to the original x,y,z arrays structure.

First, we have to calculate the center coordinates of the bins

binx_centers=(binx[1:] + binx[:-1])/2
biny_centers=(biny[1:] + biny[:-1])/2
Xcenters, Ycenters = np.meshgrid(binx_centers,biny_centers)

Then we can get the not masked values (see explanation above)

xnew=np.ma.masked_array(Xcenters, Z.mask).compressed()
ynew=np.ma.masked_array(Ycenters, Z.mask).compressed()
znew=Z.compressed()

We can check the new size :

print(znew.shape)

Gives only 235 values (instead of 1000.):

(235L,)

And the new scatter plot with the compressed values :

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(xnew, ynew, znew)

We obtain :

How to sample points from a data set using a grid?

Answers (1)

Related Questions