Reputation: 314
I am using pandas, Jupyter notebooks and python. I have a following dataset as a dataframe
Cars,Country,Type
1564,Australia,Stolen
200,Australia,Stolen
579,Australia,Stolen
156,Japan,Lost
900,Africa,Burnt
2000,USA,Stolen
1000,Indonesia,Stolen
900,Australia,Lost
798,Australia,Lost
128,Australia,Lost
200,Australia,Burnt
56,Australia,Burnt
348,Australia,Burnt
1246,USA,Burnt
I would like to know how I can use a box plot to answer the following question "Number of cars in Australia that were affected by each type". So basically, I should have 3 boxplots(for each type) showing the number of cars affected in Australia.
Please keep in mind that this is a subset of the real dataset.
Upvotes: 0
Views: 663
Reputation: 29711
You can select only the rows corresponding to "Australia"
from the column "Country"
and group it by the column "Type"
as shown:
from StringIO import StringIO
import pandas as pd
text_string = StringIO(
"""
Cars,Country,Type,Score
1564,Australia,Stolen,1
200,Australia,Stolen,2
579,Australia,Stolen,3
156,Japan,Lost,4
900,Africa,Burnt,5
2000,USA,Stolen,6
1000,Indonesia,Stolen,7
900,Australia,Lost,8
798,Australia,Lost,9
128,Australia,Lost,10
200,Australia,Burnt,11
56,Australia,Burnt,12
348,Australia,Burnt,13
1246,USA,Burnt,14
""")
df = pd.read_csv(text_string, sep = ",")
# Specifically checks in column name "Cars"
group = df.loc[df['Country'] == 'Australia'].boxplot(column = 'Cars', by = 'Type')
Upvotes: 2