Reputation: 3919
Suppose I have two lists:
x1 = [1,2,3,4,5,6,7,8,1,10]
x2 = [2,4,2,1,1,1,1,1,2,1]
Here, each index i
of the list is a point in time, and x2[i]
denotes the number of times (frequency) than x1[i]
was observed was observed at time i
. Note also that x1[0] = 1 and x1[8] = 1, with a total frequency of 4 (= x2[0] + x2[8]).
How do I efficiently turn this into a histogram? The easy way is below, but this is probably inefficient (creating third object and looping) and would hurt me since I have gigantic data.
import numpy as np
import matplotlib.pyplot as plt
x3 = []
for i in range(10):
for j in range(x2[i]):
x3.append(i)
hist, bins = np.histogram(x1,bins = 10)
width = 0.7*(bins[1]-bins[0])
center = (bins[:-1]+bins[1:])/2
plt.bar(center, hist, align = 'center', width = width)
plt.show()
Upvotes: 2
Views: 3301
Reputation: 87366
The best way to do this is to use the weights
kwarg on np.histogram
(doc), which will also deal with arbitrary bin size and non-integer values in x1
vals, bins = np.histogram(x1, bins=10, weights=x2)
If you just need to accumulate based on integer values you can create your histogram in one pass:
new_array = np.zeros(x2.shape) # or use a list, but I like numpy and you have it
for ind, w in izip(x1, x2):
# -1 because your events seem to start at 1, not 0
new_array[ind-1] += w
If you really want to do this with lists you can use the list comprehension
[_x for val, w in zip(x1, x2) for _x in [val]*w]
which returns
[1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 6, 7, 8, 1, 1, 10]
As a side note, it is worth understanding how to efficiently compute histograms by hand:
from __future__ import division
from itertools import izip
num_new_bins = 5
new_min = 0
new_max = 10
re_binned = np.zeros(num_new_bins)
for v, w in izip(x1, x2):
# figure out what new bin the value should go into
ind = int(num_new_bins * (v - new_min) / new_max)
# make sure the value really falls into the new range
if ind < 0 or ind >= num_new_bins:
# over flow
pass
# add the weighting to the proper bin
re_binned[ind] += w
Upvotes: 3
Reputation: 3919
One way is to use x3 = np.repeat(x1,x2)
and make a histogram with x3.
Upvotes: -1
Reputation: 504
It seems your binning has a problem .The count of 2 should be 4 . Isn't it ? Here is a code. Here we are creating one array extra but it is operated only once and also dynamically. Hope it helps.
import numpy as np
import matplotlib.pyplot as plt
x1 = [1,2,3,4,5,6,7,8,1,10]
x2 = [2,4,2,1,1,1,1,1,2,1]
#your method
x3 = []
for i in range(10):
for j in range(x2[i]):
x3.append(i)
plt.subplot(1,2,1)
hist, bins = np.histogram(x1,bins = 10)
width = 0.7*(bins[1]-bins[0])
center = (bins[:-1]+bins[1:])/2
plt.bar(center, hist, align = 'center', width = width)
plt.title("Posted Method")
#plt.show()
#New Method
new_array=np.zeros(len(x1))
for count,p in enumerate(x1):
new_array[p-1]+=x2[count]
plt.subplot(1,2,2)
hist, bins = np.histogram(x1,bins = 10)
width = 0.7*(bins[1]-bins[0])
center = (bins[:-1]+bins[1:])/2
plt.bar(center, new_array, align = 'center', width = width)
plt.title("New Method")
plt.show()
And here is the output:
Upvotes: 1