Reputation: 31
I have a dataset which looks like the following. I'm using R to work on this data. The first three columns (year,id and var) forms part of the raw data. I need to create the new variable ans as follows
If var=1, then for each year (where var=1), i need to create a new dummy ans which takes the value of 1 for all corresponding id's where an instance of var=1 was recorded. Sample data with the expected output(ans) is shown below.
year id var ans
2010 1 1 1
2010 2 0 0
2010 1 0 1
2010 1 0 1
2011 2 1 1
2011 2 0 1
2011 1 0 0
2011 1 0 0
Any help on how to achieve this is much appreciated.
Thanks Anup
Upvotes: 2
Views: 196
Reputation: 132864
Use ddply
with transform
and any
:
DF <- read.table(text=" year id var ans
2010 1 1 1
2010 2 0 0
2010 1 0 1
2010 1 0 1
2011 2 1 1
2011 2 0 1
2011 1 0 0
2011 1 0 0", header=TRUE)
library(plyr)
ddply(DF,.(year,id),transform, ans2 = as.numeric(any(var==1)))
# year id var ans ans2
# 1 2010 1 1 1 1
# 2 2010 1 0 1 1
# 3 2010 1 0 1 1
# 4 2010 2 0 0 0
# 5 2011 1 0 0 0
# 6 2011 1 0 0 0
# 7 2011 2 1 1 1
# 8 2011 2 0 1 1
Note that ddply
reorders by design.
Upvotes: 1