Filtering with Pandas

Question

I am playing with the following dataset (which is basically a dataset representing the number of gunshot deaths in the US) and I am trying to prove that "Around two-thirds of homicide victims who are males in the age-group of 15--34 are black".

Here's my attempt :

data = pd.read_csv("./guns-data-master/full_data.csv")
homicides = data[data['intent'] == 'Homicide']
male_homicides = homicides[homicides['sex'] == 'M']
less_thirty_four = male_homicides[male_homicides['age'] <= 34.0]
within_range = less_thirty_four[less_thirty_four['age'] >= 15.0]
within_range.race.value_counts()

which basically gives me enough information to prove what I want. However, I am sure that there must be an easier and more efficient way to filter out all the homicide victims which are males and between 15 and 34 years old.

What can I do to make this filtering process more efficient?

dbokers · Accepted Answer

In addition to what @hypnos has mentioned, an alternative way to do it (with perhaps better readability) is to use the query method.

url = "https://raw.githubusercontent.com/fivethirtyeight/guns-data/master/full_data.csv"
df = pd.read_csv(url, index_col=[0])

df.query("age >= 25 and age <= 34 and intent == 'Homicide' and sex == 'M'") \
  .race \
  .value_counts()
Black                             5901
White                             1568
Hispanic                          1564
Asian/Pacific Islander             122
Native American/Native Alaskan      90

Filtering with Pandas

Answers (2)

Related Questions