Reputation: 577
1st colunn : Weapon
2nd column : Pepetrator_Age
What i am trying to find is which weapon is popular in which age.
For example i am trying to draw a similar graph like this:
For example y axis should be number of cases x axis age of Perpetrator
and lines are weapon type that Perpetrator used
You can copy paste this to jupyter to initialize dataset
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
data = pd.read_csv("hdb.csv", low_memory=False)
cols = data.columns
cols = cols.map(lambda x: x.replace(' ', '_'))
data.columns = cols
#clear the unnecessary data here
data = data.drop(['Agency_Code', 'Victim_Ethnicity', 'Agency_Name','Agency_Type', 'Perpetrator_Ethnicity', 'Victim_Count', 'Perpetrator_Count'], axis=1)
data = data[data.Perpetrator_Age != "0"]
data = data[data.Perpetrator_Age != ""]
data = data[data.Perpetrator_Age != " "]
data = data[data.Victim_Sex != "Unknown"]
data = data[data.Victim_Race != "Unknown"]
data = data[data.Perpetrator_Sex != "Unknown"]
data = data[data.Perpetrator_Race != "Unknown"]
data = data[data.Relationship != "Unknown"]
data = data[data.Weapon != "Unknown"]
data
Data set here: https://www.kaggle.com/jyzaguirre/us-homicide-reports
Upvotes: 0
Views: 321
Reputation: 12406
IIUC, this grouping of data is likely better shown as a grouped bar chart, such as in Seaborn's countplot
, and not in a line plot because you want to color by a specific column (Weapon
) but you want to show a different column on the x-axis (Perpetrator_Age
). AFAIK, a line plot will not capture these aggregations simultaneously.
Here is an explicit pandas groupby
to show the aggregations that you are referencing
df_grouped = df.groupby(['Perpetrator_Age', 'Weapon']).count()
print(df_grouped)
Perpetrator_Race Relationship
Perpetrator_Age Weapon
15 Blunt Object 1 1
27 Knife 1 1
36 Rifle 1 1
42 Strangulation 2 2
Now, you want to show the first index level (Perpetrator_Age
) on the x-axis and the second index level Weapon
must be used to color the plotted data.
Here are a few approaches (that do not require groupby
)
Seaborn
countplot
which will generate a bar plot of counts (corresponding the number of cases or, in general, number of records in each grouping) and it allows you to specify the column to use to group the dataWeapon
column, countplot
allows the parameter hue
where you can specify thisImports
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.set(style="whitegrid")
Code
ax = sns.countplot(x="Perpetrator_Age", hue="Weapon", data=df)
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles, labels=labels)
ax.set_ylabel("Number of cases")
Altair
Imports
import altair as alt
alt.renderers.enable('notebook')
Code
alt.Chart(df).mark_bar(size=15).encode(
alt.Y('count(Weapon):Q', axis=alt.Axis(title='Number of cases')),
alt.X('Perpetrator_Age:O', axis=alt.Axis(labelAngle=0)),
color='Weapon:N'
).properties(
width=250,
height=250
)
Upvotes: 1