user3486773
user3486773

Reputation: 1246

How can I dynamically create bins in Python?

I have the following np array:

[['ID1', 922.63, 'Product 1'],
['ID1', 1001, 'Product 2'],
['ID1', 800, 'Product 1'],
['ID1', 922.63, 'Product 1'],
['ID1', 1001, 'Product 2'],
['ID2', 800, 'Product 1'],
['ID2', 922.63, 'Product 1'],
['ID2', 1001, 'Product 2'],
['ID3', 800, 'Product 1'],
['ID3', 700.63, 'Product 1'],
['ID3', 1200, 'Product 2'],
['ID3', 850, 'Product 1']]

The '2nd column' ($ amount) is what I care about. I want to build a histogram of product 1 and product 2, but I want the bins to be sized by 100. The actual data set I'm using has 75K rows and values that range from $1 to $200000. I want to automatically create these 'buckets' for the values and then build a histogram.

I thought it would be easy to find info on this using either pandas or numpy but I am either a newb and not able to understand other 'similar' solutions, or am just not finding what I'm looking for. Seems like it should be straight forward.

Upvotes: 0

Views: 3989

Answers (1)

Sebastian Wozny
Sebastian Wozny

Reputation: 17506

You can get a histogram by turning your data into a pandas.DataFrame:

a = [['ID1', 922.63, 'Product 1'],
['ID1', 1001, 'Product 2'],
['ID1', 800, 'Product 1'],
['ID1', 922.63, 'Product 1'],
['ID1', 1001, 'Product 2'],
['ID2', 800, 'Product 1'],
['ID2', 922.63, 'Product 1'],
['ID2', 1001, 'Product 2'],
['ID3', 800, 'Product 1'],
['ID3', 700.63, 'Product 1'],
['ID3', 1200, 'Product 2'],
['ID3', 850, 'Product 1']]
q=pd.DataFrame(a,columns=['id','price','product'])
q.hist(column='price',bins=100)

enter image description here

You can specify the number of bins you want with the bins parameter:

 q.hist(column='price', bins=100)

enter image description here

If you want to group it by product use the by parameter:

 q.hist(column='price', bins=100,by='product')

enter image description here

Upvotes: 4

Related Questions