Reputation: 1246
I have the following np array:
[['ID1', 922.63, 'Product 1'],
['ID1', 1001, 'Product 2'],
['ID1', 800, 'Product 1'],
['ID1', 922.63, 'Product 1'],
['ID1', 1001, 'Product 2'],
['ID2', 800, 'Product 1'],
['ID2', 922.63, 'Product 1'],
['ID2', 1001, 'Product 2'],
['ID3', 800, 'Product 1'],
['ID3', 700.63, 'Product 1'],
['ID3', 1200, 'Product 2'],
['ID3', 850, 'Product 1']]
The '2nd column' ($ amount) is what I care about. I want to build a histogram of product 1 and product 2, but I want the bins to be sized by 100. The actual data set I'm using has 75K rows and values that range from $1 to $200000. I want to automatically create these 'buckets' for the values and then build a histogram.
I thought it would be easy to find info on this using either pandas or numpy but I am either a newb and not able to understand other 'similar' solutions, or am just not finding what I'm looking for. Seems like it should be straight forward.
Upvotes: 0
Views: 3989
Reputation: 17506
You can get a histogram by turning your data into a pandas.DataFrame
:
a = [['ID1', 922.63, 'Product 1'],
['ID1', 1001, 'Product 2'],
['ID1', 800, 'Product 1'],
['ID1', 922.63, 'Product 1'],
['ID1', 1001, 'Product 2'],
['ID2', 800, 'Product 1'],
['ID2', 922.63, 'Product 1'],
['ID2', 1001, 'Product 2'],
['ID3', 800, 'Product 1'],
['ID3', 700.63, 'Product 1'],
['ID3', 1200, 'Product 2'],
['ID3', 850, 'Product 1']]
q=pd.DataFrame(a,columns=['id','price','product'])
q.hist(column='price',bins=100)
You can specify the number of bins you want with the bins
parameter:
q.hist(column='price', bins=100)
If you want to group it by product use the by
parameter:
q.hist(column='price', bins=100,by='product')
Upvotes: 4