Reputation: 155
I am playing with the following dataset (which is basically a dataset representing the number of gunshot deaths in the US) and I am trying to prove that "Around two-thirds of homicide victims who are males in the age-group of 15--34 are black".
Here's my attempt :
data = pd.read_csv("./guns-data-master/full_data.csv")
homicides = data[data['intent'] == 'Homicide']
male_homicides = homicides[homicides['sex'] == 'M']
less_thirty_four = male_homicides[male_homicides['age'] <= 34.0]
within_range = less_thirty_four[less_thirty_four['age'] >= 15.0]
within_range.race.value_counts()
which basically gives me enough information to prove what I want. However, I am sure that there must be an easier and more efficient way to filter out all the homicide victims which are males and between 15 and 34 years old.
What can I do to make this filtering process more efficient?
Upvotes: 0
Views: 64
Reputation: 920
In addition to what @hypnos has mentioned, an alternative way to do it (with perhaps better readability) is to use the query method.
url = "https://raw.githubusercontent.com/fivethirtyeight/guns-data/master/full_data.csv"
df = pd.read_csv(url, index_col=[0])
df.query("age >= 25 and age <= 34 and intent == 'Homicide' and sex == 'M'") \
.race \
.value_counts()
Black 5901
White 1568
Hispanic 1564
Asian/Pacific Islander 122
Native American/Native Alaskan 90
Upvotes: 1
Reputation: 494
Try this:
data = pd.read_csv("./guns-data-master/full_data.csv")
homicides = data[(data['intent'] == 'Homicide') & (data['sex'] == 'M') & (data['age'] <= 34.0) & (data['age'] >= 15.0) ]
homicides.race.value_counts()
Upvotes: 0