Reputation: 41
I have a count table as dataframe in Python and I want to plot my distribution as a boxplot. E.g.:
df=pandas.DataFrame.from_items([('Quality',[29,30,31,32,33,34,35,36,37,38,39,40]), ('Count', [3,38,512,2646,9523,23151,43140,69250,107597,179374,840596,38243])])
I 'solved' it by repeating my quality value by its count. But I dont think its a good way and my dataframe is getting very very big.
In R there its a one liner:
ggplot(df, aes(x=1,y=Quality,weight=Count)) + geom_boxplot()
This will output:!Boxplot from R1
My aim is to compare the distribution of different groups and it should look like
Can Python solve it like this too?
Upvotes: 4
Views: 2953
Reputation: 101
What are you trying to look at here? The boxplot hereunder will return the following figure.
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
df=pd.DataFrame.from_items([('Quality',[29,30,31,32,33,34,35,36,37,38,39,40]), ('Count', [3,38,512,2646,9523,23151,43140,69250,107597,179374,840596,38243])])
plt.figure()
df_box = df.boxplot(column='Quality', by='Count',return_type='axes')
If you want to look at your Quality distibution weighted on Count, you can try plotting an histogramme:
plt.figure()
df_hist = plt.hist(df.Quality, bins=10, range=None, normed=False, weights=df.Count)
Upvotes: 1