lastpeony4
lastpeony4

Reputation: 577

Python Pandas 2 Column Relation

1st colunn : Weapon

2nd column : Pepetrator_Age

What i am trying to find is which weapon is popular in which age. enter image description here

For example i am trying to draw a similar graph like this: enter image description here

For example y axis should be number of cases x axis age of Perpetrator

and lines are weapon type that Perpetrator used

You can copy paste this to jupyter to initialize dataset

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
data = pd.read_csv("hdb.csv", low_memory=False)
cols = data.columns
cols = cols.map(lambda x: x.replace(' ', '_'))
data.columns = cols
#clear the unnecessary data here
data = data.drop(['Agency_Code', 'Victim_Ethnicity', 'Agency_Name','Agency_Type', 'Perpetrator_Ethnicity', 'Victim_Count', 'Perpetrator_Count'], axis=1)
data = data[data.Perpetrator_Age != "0"]
data = data[data.Perpetrator_Age != ""]
data = data[data.Perpetrator_Age != " "]
data = data[data.Victim_Sex != "Unknown"]
data = data[data.Victim_Race != "Unknown"]
data = data[data.Perpetrator_Sex != "Unknown"]
data = data[data.Perpetrator_Race != "Unknown"]
data = data[data.Relationship != "Unknown"]
data = data[data.Weapon != "Unknown"]
data

Data set here: https://www.kaggle.com/jyzaguirre/us-homicide-reports

Upvotes: 0

Views: 321

Answers (1)

edesz
edesz

Reputation: 12406

IIUC, this grouping of data is likely better shown as a grouped bar chart, such as in Seaborn's countplot, and not in a line plot because you want to color by a specific column (Weapon) but you want to show a different column on the x-axis (Perpetrator_Age). AFAIK, a line plot will not capture these aggregations simultaneously.

Here is an explicit pandas groupby to show the aggregations that you are referencing

df_grouped = df.groupby(['Perpetrator_Age', 'Weapon']).count()

print(df_grouped)
                               Perpetrator_Race  Relationship
Perpetrator_Age Weapon                                       
15              Blunt Object                  1             1
27              Knife                         1             1
36              Rifle                         1             1
42              Strangulation                 2             2

Now, you want to show the first index level (Perpetrator_Age) on the x-axis and the second index level Weapon must be used to color the plotted data.

Here are a few approaches (that do not require groupby)

Seaborn

  • use countplot which will generate a bar plot of counts (corresponding the number of cases or, in general, number of records in each grouping) and it allows you to specify the column to use to group the data
  • since you want to color by the Weapon column, countplot allows the parameter hue where you can specify this
  • additional links

Imports

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.set(style="whitegrid")

Code

ax = sns.countplot(x="Perpetrator_Age", hue="Weapon", data=df)
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles, labels=labels)
ax.set_ylabel("Number of cases")

Seaborn_approach

Altair

Imports

import altair as alt
alt.renderers.enable('notebook')

Code

alt.Chart(df).mark_bar(size=15).encode(
    alt.Y('count(Weapon):Q', axis=alt.Axis(title='Number of cases')),
    alt.X('Perpetrator_Age:O', axis=alt.Axis(labelAngle=0)),
    color='Weapon:N'
).properties(
    width=250,
    height=250
)

Altair approach

Upvotes: 1

Related Questions