Stefan D
Stefan D

Reputation: 1249

Matplotlib histogram

I am trying to plot a simple histogram. I have processed my data as a list: X = [30, 2728, 2894, 2582, 2309, 2396, 2491, 2453, 2382, 2325, 2225, 2359, 2138...]

where every position corresponds to the number of items with that value (so 30 items for 0, 2728 for 1, etc.) If I plot this list as a bar chart I get the desired result but the resolution is too high (i.e. every value is a bucket). What I want to do is to merge buckets so I can get as my X values: 0, 1-10, 10-50, 50-150, 150-500 and as Y values the sum of items in the desired range, so for 0 I will have y value 30, for 1-10 I will have value sum(2728, 2894, 2582, 2309, 2396, 2491, 2453, 2382, 2325, 2225), etc.

I tried this way:

plt.hist(X,bins=[0,1,10])

but I don't get the desired result, I expect to get one bar 0-1 with y=30 and a second bar 1-10 with y=24785, but that's not what it plots.

What's the best way to do this?

Upvotes: 0

Views: 1953

Answers (2)

Julien
Julien

Reputation: 203

You want to merge the buckets into a customized list: 0, 1-10, 10-50, 50-150, 150-500. Since this is a customized list, I'm not sure if you can directly tell plt.hist what the bins are. I would suggest manually counting how many values are in each of the customized bins. It greatly helps if you transform your list X into a NumPy array using np.array().

    X = np.array([30, 2728, 2894, 2582, 2309, 2396, 2491, 2453, 2382, 2325, 2225, 2359, 2138])
    ##Customized bin list:
    bin_list = np.array( [0,1,10,50,150, 500, np.inf ])   ##Can specify 500 to be inf as well
    plot_bin = np.zeros( len(bin_list)-1)
    for bin_n in range(len(bin_list)-1):
       plot_bin[bin_n] = np.sum( (X >= bin_list[bin_n]) & (X < bin_list[bin_n+1]) )

    ## Create string version of the buckets to use as labels
    str_bin_list_lower = [str(a) for a in bin_list[0: -1 ]]
    x_ticks = np.arange(len(bin_list)-1)-0.35
    plt.bar( x_ticks, plot_bin)
    plt.xticks( x_ticks+0.35, str_bin_list_lower )

Editted: I misunderstood your question. You have a bin list of [0,10,50] and want to add the numbers from [0], [1-10], [10-50], etc. You should be more familiar with how Python indexes elements. For example, range(10)[0:5] = [0,1,2,3,4] and range(10)[5:10] = [5,6,7,8,9]. You need to account for this when you make your bin list. Then the binning process should be:

    X = np.array([30, 2728, 2894, 2582, 2309, 2396, 2491, 2453, 2382, 2325, 2225, 2359, 2138])
    bin_list = np.array( [0,10,50,150, 500, np.inf ])+1   ##Can specify 500 
    plot_bin = np.zeros( len(bin_list)-1)
    for bin_n in range(len(bin_list)-1):
      if bin_n==len(bin_list)-2:
        plot_bin[bin_n] = np.sum( X[ bin_list[bin_n]: ] )
      else:
        plot_bin[bin_n] = np.sum( X[ bin_list[bin_n]:bin_list[bin_n+1]+1] )
    plot_bin = np.insert(plot_bin, 0, X[0])

Upvotes: 0

Sergey Bushmanov
Sergey Bushmanov

Reputation: 25249

For the way you preprocess data the right way to plot it is:

X = [30, 2728, 2894, 2582, 2309, 2396, 2491, 2453, 2382, 2325, 2225, 2359, 2138]
plt.bar(range(len(X)),X);

enter image description here

However, matplotlib provides an even easier and more straightforward way to plot a histogram:

x = np.random.randn(1000)
plt.hist(x, bins=30);

enter image description here

If you want a more direct control over binning, you may want to switch to Pandas and try pd.cut where you can define your own bins:

import pandas as pd
df = pd.DataFrame({'x':np.random.randint(0,100,1000)})
factor = pd.cut(df.x, [1,10,20,100])
df.groupby(factor).apply(lambda x: x.count()).plot(kind='bar', rot=45, legend=0);

enter image description here

Upvotes: 1

Related Questions