Reputation: 29
My data consists of a 2-D array of masses and distances. I want to produce a plot where the x-axis is distance and the y axis is the number of data elements with distance <= x (i.e. a cumulative histogram plot). What is the most efficient way to do this with Python?
PS: the masses are irrelevant since I already have filtered by mass, so all I am trying to produce is a plot using the distance data.
Example plot below:
Upvotes: 0
Views: 298
Reputation: 29
This is what I figured I can do given a 1D array of data:
plt.figure()
counts = np.ones(len(data))
plt.step(np.sort(data), counts.cumsum())
plt.show()
This apparently works with duplicate elements also, as the ys will be added for each x.
Upvotes: 0
Reputation: 80299
You can combine numpy.cumsum() and plt.step():
import matplotlib.pyplot as plt
import numpy as np
N = 15
distances = np.random.uniform(1, 4, 15).cumsum()
counts = np.random.uniform(0.5, 3, 15)
plt.step(distances, counts.cumsum())
plt.show()
Alternatively, plt.bar
can be used to draw a histogram, with the widths defined by the difference between successive distances. Optionally, an extra distance needs to be appended to give the last bar a width.
plt.bar(distances, counts.cumsum(), width=np.diff(distances, append=distances[-1]+1), align='edge')
plt.autoscale(enable=True, axis='x', tight=True) # make x-axis tight
Instead of appending a value, e.g. a zero could be prepended, depending on the exact interpretation of the data.
plt.bar(distances, counts.cumsum(), width=-np.diff(distances, prepend=0), align='edge')
Upvotes: 2