Huzo
Huzo

Reputation: 1692

Plotting a histogram using a range of values and their frequency as a dictionary

Assume that I have the following dictionary:

scenario_summary = {'Day1': {'22459-22585': 0.0, '22585-22711': 0.0, '22711-22837': 0.0, '22837-22963': 0.0, '22963-23089': 0.0, '23089-23215': 0.0, '23215-23341': 0.0, '23341-23467': 0.0, '23467-23593': 0.0, '23593-23719': 0.0, '23719-23845': 0.0, '23845-23971': 0.0, '23971-24097': 0.0, '24097-24223': 0.0, '24223-24349': 0.0, '24349-24475': 0.0, '24475-24601': 0.0, '24601-24727': 0.0, '24727-24853': 0.0, '24853-24979': 0.0, '24979-25105': 0.0, '25105-25231': 0.0, '25231-25357': 0.0, '25357-25483': 0.0, '25483-25609': 0.0, '25609-25735': 0.0, '25735-25861': 0.0, '25861-25987': 0.0, '25987-26113': 1.0, '26113-26239': 1.0, '26239-26365': 0.0, '26365-26491': 2.0, '26491-26617': 5.0, '26617-26743': 5.0, '26743-26869': 5.0, '26869-26995': 12.0, '26995-27121': 19.0, '27121-27247': 7.000000000000001, '27247-27373': 11.0, '27373-27499': 15.0, '27499-27625': 7.000000000000001, '27625-27751': 4.0, '27751-27877': 4.0, '27877-28003': 2.0, '28003-28129': 0.0, '28129-28255': 0.0, '28255-28381': 0.0, '28381-28507': 0.0, '28507-28633': 0.0, '28633-28759': 0.0, '28759-28885': 0.0, '28885-29011': 0.0, '29011-29137': 0.0, '29137-29263': 0.0, '29263-29389': 0.0, '29389-29515': 0.0, '29515-29641': 0.0, '29641-29767': 0.0, '29767-29893': 0.0, '29893-30019': 0.0, '30019-30145': 0.0, '30145-30271': 0.0, '30271-30397': 0.0, '30397-30523': 0.0, '30523-30649': 0.0, '30649-30775': 0.0, '30775-30901': 0.0, '30901-31027': 0.0, '31027-31153': 0.0, '31153-31279': 0.0, '31279-31405': 0.0, '31405-31531': 0.0, '31531-31657': 0.0, '31657-31783': 0.0, '31783-31909': 0.0, '31909-32035': 0.0, '32035-32161': 0.0, '32161-32287': 0.0, '32287-32413': 0.0, '32413-32539': 0.0, '32539-32665': 0.0, '32665-32791': 0.0, '32791-32917': 0.0, '32917-33043': 0.0, '33043-33169': 0.0, '33169-33295': 0.0, '33295-33421': 0.0, '33421-33547': 0.0, '33547-33673': 0.0, '33673-33799': 0.0, '33799-33925': 0.0, '33925-34051': 0.0, '34051-34177': 0.0, '34177-34303': 0.0, '34303-34429': 0.0, '34429-34555': 0.0, '34555-34681': 0.0, '34681-34807': 0.0}}

As you can see, the dictionary consists of a range of values in string and their frequency. I would like to plot this as a histogram, but I don't know how I would be able to transform the string into a form that pandas or plotly would understand. What would your approach be? Or is there an easier way to do it, instead of hardcoding things? Or, would another module be easier option in doing so?

Thanks!

Upvotes: 4

Views: 2773

Answers (3)

lk_ayyagari
lk_ayyagari

Reputation: 71

Since the bins (ranges) are already defined and their counts are already aggregated at an initial level, maybe it can help if you build something that overlays a histogram (distribution) on the top of the existing bin ranges:

import matplotlib
%matplotlib inline
def plot_hist(bins,input_dict):
    df1 = pd.DataFrame(input_dict).reset_index()
    df1['min'] = df1['index'].apply(lambda x:x.split('-')[0]).astype(int)
    df1['max'] = df1['index'].apply(lambda x:x.split('-')[1]).astype(int)
    df1['group'] = pd.cut(df1['max'],bins,labels=False)
    df2 = df1.groupby('group' [['Day1','min','max']].agg({'min':'min','max':'max','Day1':'sum'}).reset_index()
    df2['range_new'] = df2['min'].astype(str) + str('-') + df2['max'].astype(str)
    df2.plot(x='range_new',y='Day1',kind='bar')

...and call the function by choosing bins lesser than the length of the dictionary - or the first level of 98 bins that are already there, like, say if you want a distribution of 20 groups aggregate:

plot_hist(20,scenario_summary)

Result Image :

hope it helps...

Upvotes: 2

Marco13
Marco13

Reputation: 54639

A histogram is basically a simple bar chart, where each bar represents a bin (usually in the form of a range) and a frequency of the elements that fall into that bin.

This is exactly the data that you already have. So instead of computing values for a histogram (as it would be done with plt.hist), you can simply pass your data to plt.bar, as it is. The result would then be this:

Histogram

The code with your data, as a MCVE :

import matplotlib.pyplot as plt

scenario_summary = { 'Day1': {
    '22459-22585': 0.0, '22585-22711': 0.0, '22711-22837': 0.0,
    '22837-22963': 0.0, '22963-23089': 0.0, '23089-23215': 0.0,
    '23215-23341': 0.0, '23341-23467': 0.0, '23467-23593': 0.0,
    '23593-23719': 0.0, '23719-23845': 0.0, '23845-23971': 0.0,
    '23971-24097': 0.0, '24097-24223': 0.0, '24223-24349': 0.0,
    '24349-24475': 0.0, '24475-24601': 0.0, '24601-24727': 0.0,
    '24727-24853': 0.0, '24853-24979': 0.0, '24979-25105': 0.0,
    '25105-25231': 0.0, '25231-25357': 0.0, '25357-25483': 0.0,
    '25483-25609': 0.0, '25609-25735': 0.0, '25735-25861': 0.0,
    '25861-25987': 0.0, '25987-26113': 1.0, '26113-26239': 1.0,
    '26239-26365': 0.0, '26365-26491': 2.0, '26491-26617': 5.0,
    '26617-26743': 5.0, '26743-26869': 5.0, '26869-26995': 12.0,
    '26995-27121': 19.0, '27121-27247': 7.0, '27247-27373': 11.0,
    '27373-27499': 15.0, '27499-27625': 7.0, '27625-27751': 4.0,
    '27751-27877': 4.0, '27877-28003': 2.0, '28003-28129': 0.0,
    '28129-28255': 0.0, '28255-28381': 0.0, '28381-28507': 0.0,
    '28507-28633': 0.0, '28633-28759': 0.0, '28759-28885': 0.0,
    '28885-29011': 0.0, '29011-29137': 0.0, '29137-29263': 0.0,
    '29263-29389': 0.0, '29389-29515': 0.0, '29515-29641': 0.0,
    '29641-29767': 0.0, '29767-29893': 0.0, '29893-30019': 0.0,
    '30019-30145': 0.0, '30145-30271': 0.0, '30271-30397': 0.0,
    '30397-30523': 0.0, '30523-30649': 0.0, '30649-30775': 0.0,
    '30775-30901': 0.0, '30901-31027': 0.0, '31027-31153': 0.0,
    '31153-31279': 0.0, '31279-31405': 0.0, '31405-31531': 0.0,
    '31531-31657': 0.0, '31657-31783': 0.0, '31783-31909': 0.0,
    '31909-32035': 0.0, '32035-32161': 0.0, '32161-32287': 0.0,
    '32287-32413': 0.0, '32413-32539': 0.0, '32539-32665': 0.0,
    '32665-32791': 0.0, '32791-32917': 0.0, '32917-33043': 0.0,
    '33043-33169': 0.0, '33169-33295': 0.0, '33295-33421': 0.0,
    '33421-33547': 0.0, '33547-33673': 0.0, '33673-33799': 0.0,
    '33799-33925': 0.0, '33925-34051': 0.0, '34051-34177': 0.0,
    '34177-34303': 0.0, '34303-34429': 0.0, '34429-34555': 0.0,
    '34555-34681': 0.0, '34681-34807': 0.0}}

data = scenario_summary['Day1']

x = range(len(data))
y = list(data.values())

plt.figure(figsize=(16, 9))
plt.bar(x, y)
plt.subplots_adjust(bottom=0.2)
plt.xticks(x, data.keys(), rotation='vertical')
plt.show()

Upvotes: 1

Zaraki Kenpachi
Zaraki Kenpachi

Reputation: 5730

You can use pandas module to convert dictionary data into data frame:

import pandas as pd
import matplotlib.pyplot as plt

scenario_summary = {'Day1': {'22459-22585': 0.0, '22585-22711': 0.0, '22711-22837': 0.0,
                         '22837-22963': 0.0, '22963-23089': 0.0, '23089-23215': 0.0,
                         '23215-23341': 0.0, '23341-23467': 0.0, '23467-23593': 0.0,
                         '23593-23719': 0.0, '23719-23845': 0.0, '23845-23971': 0.0,
                         '23971-24097': 0.0, '24097-24223': 0.0, '24223-24349': 0.0,
                         '24349-24475': 0.0, '24475-24601': 0.0, '24601-24727': 0.0,
                         '24727-24853': 0.0, '24853-24979': 0.0, '24979-25105': 0.0,
                         '25105-25231': 0.0, '25231-25357': 0.0, '25357-25483': 0.0,
                         '25483-25609': 0.0, '25609-25735': 0.0, '25735-25861': 0.0,
                         '25861-25987': 0.0, '25987-26113': 1.0, '26113-26239': 1.0,
                         '26239-26365': 0.0, '26365-26491': 2.0, '26491-26617': 5.0,
                         '26617-26743': 5.0, '26743-26869': 5.0, '26869-26995': 12.0,
                         '26995-27121': 19.0, '27121-27247': 7.000000000000001, '27247-27373': 11.0,
                         '27373-27499': 15.0, '27499-27625': 7.000000000000001, '27625-27751': 4.0,
                         '27751-27877': 4.0, '27877-28003': 2.0, '28003-28129': 0.0,
                         '28129-28255': 0.0, '28255-28381': 0.0, '28381-28507': 0.0,
                         '28507-28633': 0.0, '28633-28759': 0.0, '28759-28885': 0.0,
                         '28885-29011': 0.0, '29011-29137': 0.0, '29137-29263': 0.0,
                         '29263-29389': 0.0, '29389-29515': 0.0, '29515-29641': 0.0,
                         '29641-29767': 0.0, '29767-29893': 0.0, '29893-30019': 0.0,
                         '30019-30145': 0.0, '30145-30271': 0.0, '30271-30397': 0.0,
                         '30397-30523': 0.0, '30523-30649': 0.0, '30649-30775': 0.0,
                         '30775-30901': 0.0, '30901-31027': 0.0, '31027-31153': 0.0,
                         '31153-31279': 0.0, '31279-31405': 0.0, '31405-31531': 0.0,
                         '31531-31657': 0.0, '31657-31783': 0.0, '31783-31909': 0.0,
                         '31909-32035': 0.0, '32035-32161': 0.0, '32161-32287': 0.0,
                         '32287-32413': 0.0, '32413-32539': 0.0, '32539-32665': 0.0,
                         '32665-32791': 0.0, '32791-32917': 0.0, '32917-33043': 0.0,
                         '33043-33169': 0.0, '33169-33295': 0.0, '33295-33421': 0.0,
                         '33421-33547': 0.0, '33547-33673': 0.0, '33673-33799': 0.0,
                         '33799-33925': 0.0, '33925-34051': 0.0, '34051-34177': 0.0,
                         '34177-34303': 0.0, '34303-34429': 0.0, '34429-34555': 0.0,
                         '34555-34681': 0.0, '34681-34807': 0.0}}

# convert to data frame
data_frame = pd.DataFrame.from_dict(scenario_summary)

# plot data
plt.hist(data_frame['Day1'], density=1, bins=20)
plt.show()

Upvotes: 1

Related Questions