user3555455
user3555455

Reputation: 487

Having trouble plotting 2-D histogram with numpy.histogram2d and matplotlib

I have a massive data set where I need to split my plot into a grid and count the number of points within each grid square. I'm following a method outlined here:

with a stripped-down version of my code below:

import numpy as np
import matplotlib.pyplot as plt

x = [ 1.83259571, 1.76278254, 1.38753676, 1.6406095, 1.34390352, 1.23045712, 1.85877565, 1.26536371, 0.97738438]

y = [ 0.04363323, 0.05235988, 0.09599311, 0.10471976, 0.1134464, 0.13962634, 0.17453293, 0.20943951, 0.23561945]

gridx = np.linspace(min(x),max(x),11)
gridy = np.linspace(min(y),max(y),11)

grid, _, _ = np.histogram2d(x, y, bins=[gridx, gridy])

plt.figure()
plt.plot(x, y, 'ro')
plt.grid(True)

plt.figure()
plt.pcolormesh(gridx, gridy, grid)
plt.plot(x, y, 'ro')
plt.colorbar()

plt.show()

Where the problem arises is the grid is identifying elements of the plot as where points are appearing yet there are no points within some of those elements; similarly, where some of the actual data points appear the grid does not recognize them as not actually being there.

What might be causing this problem? Also, sorry for not attaching the plot, I'm a new user and my reputation isn't high enough.

UPDATE Here's a code that generates 100 random points and attempts to plot them in a 2-D histogram:

import numpy as np
import matplotlib.pyplot as plt

x = np.random.rand(100)

y = np.random.rand(100)

gridx = np.linspace(0,1,11)
gridy = np.linspace(0,1,11)

grid, __, __ = np.histogram2d(x, y, bins=[gridx, gridy])

plt.figure()
plt.plot(x, y, 'ro')
plt.grid(True)

plt.figure()
plt.pcolormesh(gridx, gridy, grid)
plt.plot(x, y, 'ro')
plt.colorbar()

plt.show()

Yet when I run it I have the same problem as before: the locations of the points and the colors corresponding to point-location-density don't agree. Does this happen when anyone runs this code for themselves?

SECOND UPDATE

And at the risk of beating a dead horse, here's a code for a parametric plot:

import numpy as np
import matplotlib.pyplot as plt

t = np.linspace(0,1,100)
x = np.sin(t)
y = np.cos(t)

gridx = np.linspace(0,1,11)
gridy = np.linspace(0,1,11)

#grid, __, __ = np.histogram2d(x, y, bins=[gridx, gridy])
grid, __, __ = np.histogram2d(x, y)

plt.figure()
plt.plot(x, y, 'ro')
plt.grid(True)

plt.figure()
plt.pcolormesh(gridx, gridy, grid)
plt.plot(x, y, 'ro')
plt.colorbar()

plt.show()

which makes me think this is all some kind of weird scaling issue. Still totally lost though...

Upvotes: 5

Views: 8233

Answers (2)

Kyle Bogdan
Kyle Bogdan

Reputation: 1

Referencing numpy histogram2d's documentation...

The careful reader will note that the parameters are backwards.

histogram2d(y, x, bins=(xedges, yedges)

Compute the bi-dimensional histogram of two data samples.

Parameters

x : array_like, shape (N,) An array containing the x coordinates of the points to be histogrammed.

y : array_like, shape (N,) An array containing the y coordinates of the points to be histogrammed.

Ergo, you supplied your x's to the y parameter of the function and vice versa for the x's.

Regards

Upvotes: 0

paisanco
paisanco

Reputation: 4164

I was able to get your example to work by using imshow with interpolation instead of pcolormesh. See sample code below.

I think the problem may be that pcolormesh has a different origin convention than plot. The results of pcolormesh look like the upper left and lower right are flipped.

The result with imshow looks like:

imshow result

The sample code:

import numpy as np
import matplotlib.pyplot as plt

def doPlot():

    x = [ 1.83259571, 1.76278254, 1.38753676, 1.6406095, 1.34390352, 1.23045712, 1.85877565, 1.26536371, 0.97738438]

    y = [ 0.04363323, 0.05235988, 0.09599311, 0.10471976, 0.1134464, 0.13962634, 0.17453293, 0.20943951, 0.23561945]

    gridx = np.linspace(min(x),max(x),11)
    gridy = np.linspace(min(y),max(y),11)

    H, xedges, yedges = np.histogram2d(x, y, bins=[gridx, gridy])

    plt.figure()
    plt.plot(x, y, 'ro')
    plt.grid(True)

    #wrong origin convention for pcolormesh?
    #plt.figure()
    #plt.pcolormesh(gridx, gridy, H)
    #plt.plot(x, y, 'ro')
    #plt.colorbar()


    plt.figure()
    myextent  =[xedges[0],xedges[-1],yedges[0],yedges[-1]]
    plt.imshow(H.T,origin='low',extent=myextent,interpolation='nearest',aspect='auto')
    plt.plot(x,y,'ro')
    plt.colorbar()

    plt.show()

if __name__=="__main__":
    doPlot()

Upvotes: 2

Related Questions