user2763361
user2763361

Reputation: 3919

Histogram with separate list denoting frequency

Suppose I have two lists:

    x1 = [1,2,3,4,5,6,7,8,1,10]
    x2 = [2,4,2,1,1,1,1,1,2,1]

Here, each index i of the list is a point in time, and x2[i] denotes the number of times (frequency) than x1[i] was observed was observed at time i. Note also that x1[0] = 1 and x1[8] = 1, with a total frequency of 4 (= x2[0] + x2[8]).

How do I efficiently turn this into a histogram? The easy way is below, but this is probably inefficient (creating third object and looping) and would hurt me since I have gigantic data.

import numpy as np
import matplotlib.pyplot as plt

x3 = []
for i in range(10):
    for j in range(x2[i]):
        x3.append(i)

hist, bins = np.histogram(x1,bins = 10)
width = 0.7*(bins[1]-bins[0])
center = (bins[:-1]+bins[1:])/2
plt.bar(center, hist, align = 'center', width = width)
plt.show()

Upvotes: 2

Views: 3301

Answers (3)

tacaswell
tacaswell

Reputation: 87366

The best way to do this is to use the weights kwarg on np.histogram (doc), which will also deal with arbitrary bin size and non-integer values in x1

vals, bins = np.histogram(x1, bins=10, weights=x2)

If you just need to accumulate based on integer values you can create your histogram in one pass:

new_array = np.zeros(x2.shape)  # or use a list, but I like numpy and you have it
for ind, w in izip(x1, x2):
     # -1 because your events seem to start at 1, not 0
     new_array[ind-1] += w

If you really want to do this with lists you can use the list comprehension

[_x for val, w in zip(x1, x2) for _x in [val]*w]

which returns

[1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 6, 7, 8, 1, 1, 10]

As a side note, it is worth understanding how to efficiently compute histograms by hand:

from __future__ import division
from itertools import izip

num_new_bins = 5
new_min = 0
new_max = 10
re_binned = np.zeros(num_new_bins)
for v, w in izip(x1, x2):
    # figure out what new bin the value should go into
    ind = int(num_new_bins * (v - new_min) / new_max)
    # make sure the value really falls into the new range
    if ind < 0 or ind >= num_new_bins:
        # over flow
        pass
    # add the weighting to the proper bin
    re_binned[ind] += w

Upvotes: 3

user2763361
user2763361

Reputation: 3919

One way is to use x3 = np.repeat(x1,x2) and make a histogram with x3.

Upvotes: -1

Koustav Ghosal
Koustav Ghosal

Reputation: 504

It seems your binning has a problem .The count of 2 should be 4 . Isn't it ? Here is a code. Here we are creating one array extra but it is operated only once and also dynamically. Hope it helps.

import numpy as np
import matplotlib.pyplot as plt

x1 = [1,2,3,4,5,6,7,8,1,10]
x2 = [2,4,2,1,1,1,1,1,2,1]

#your method
x3 = []
for i in range(10):
    for j in range(x2[i]):
        x3.append(i)
plt.subplot(1,2,1)
hist, bins = np.histogram(x1,bins = 10)
width = 0.7*(bins[1]-bins[0])
center = (bins[:-1]+bins[1:])/2
plt.bar(center, hist, align = 'center', width = width)
plt.title("Posted Method")
#plt.show()

#New Method
new_array=np.zeros(len(x1))
for count,p in enumerate(x1):
    new_array[p-1]+=x2[count]
plt.subplot(1,2,2)  
hist, bins = np.histogram(x1,bins = 10)
width = 0.7*(bins[1]-bins[0])
center = (bins[:-1]+bins[1:])/2
plt.bar(center, new_array, align = 'center', width = width)
plt.title("New Method")
plt.show()

And here is the output:

enter image description here

Upvotes: 1

Related Questions