alex.l
alex.l

Reputation: 29

Cumulative histogram for 2D data in Python

My data consists of a 2-D array of masses and distances. I want to produce a plot where the x-axis is distance and the y axis is the number of data elements with distance <= x (i.e. a cumulative histogram plot). What is the most efficient way to do this with Python?

PS: the masses are irrelevant since I already have filtered by mass, so all I am trying to produce is a plot using the distance data.

Example plot below:

Example

Upvotes: 0

Views: 298

Answers (2)

alex.l
alex.l

Reputation: 29

This is what I figured I can do given a 1D array of data:

plt.figure()
counts = np.ones(len(data))
plt.step(np.sort(data), counts.cumsum())
plt.show()

This apparently works with duplicate elements also, as the ys will be added for each x.

enter image description here

Upvotes: 0

JohanC
JohanC

Reputation: 80299

You can combine numpy.cumsum() and plt.step():

import matplotlib.pyplot as plt
import numpy as np

N = 15
distances = np.random.uniform(1, 4, 15).cumsum()
counts = np.random.uniform(0.5, 3, 15)
plt.step(distances, counts.cumsum())
plt.show()

example plot

Alternatively, plt.bar can be used to draw a histogram, with the widths defined by the difference between successive distances. Optionally, an extra distance needs to be appended to give the last bar a width.

plt.bar(distances, counts.cumsum(), width=np.diff(distances, append=distances[-1]+1), align='edge')
plt.autoscale(enable=True, axis='x', tight=True)  # make x-axis tight

bar plot Instead of appending a value, e.g. a zero could be prepended, depending on the exact interpretation of the data.

plt.bar(distances, counts.cumsum(), width=-np.diff(distances, prepend=0), align='edge')

Upvotes: 2

Related Questions