Reputation: 63
I have a wide pandas dataframe with a number of variables with titles of the form 'crimeYR.' For example, crime1996 is a dummy variable that tells me if an observation has been convicted of a crime up until the year 1996, crime1998 tells me whether he/she has been convicted of a crime up until 1998, and so forth. Everyone in the dataset is born in different years and I would like to create a single variable that tells me whether or not a person has committed a crime by age 25. Here is an example of what I would like to create
birthYR crime2006 crime2008 crime2010 crimeby25
1981 0 1 1 0
1981 1 1 1 1
1983 0 1 1 1
1982 0 0 1 0
I have a general idea of how one might code it in stata, but I am struggling to get it to work in python. Here's an idea of how it might work in stata:
gen crimeby25 = 0
foreach v of num 2006/2016{
replace crimeby25 = crime`v' if `v' - birthyr == 25
}
What's a simple way of doing what I'm trying to do in Python?
Upvotes: 1
Views: 59
Reputation: 1030
Here is a solution. You have the following dataframe :
df = pd.DataFrame({'birthYR': [1981,1981,1983,1982],
'crime2006': [0,1,0,0],
'crime2008': [1,1,1,0],
'crime2010':[1,1,1,1]})
df
birthYR crime2006 crime2008 crime2010 crimeby25
1981 0 1 1 0
1981 1 1 1 1
1983 0 1 1 1
1982 0 0 1 0
Let's first define the list of years that we are studying :
years = [2006,2008,2010]
We create a few intermediary useful columns
for year in years :
# Compute the age at a given year
df["AgeIn"+str(year)] = year - df["birthYR"]
# Is he/she more than 25 at a given year
df["NotMoreThan25In"+str(year)] = df["AgeIn"+str(year)]<=25
# Let's remove age column for clarity
df = df.drop("AgeIn"+str(year),axis=1)
# Check if he/she commited crime and was not more than 25 at a given year
df["NotMoreThan25In"+str(year)+"AndCrime"] = df["NotMoreThan25In"+str(year)]*df["crime"+str(year)]
In the end, we just sum the total on each year to see if he/she commited crime by 25 :
df["crimeby25"] = df[["Not25In"+str(year)+"AndCrime" for year in years]].max(axis=1)
df["crimeby25"]
0 0
1 1
2 1
3 0
Upvotes: 1