Reputation: 1154
I have figured out a very inefficient way of using vectors in an if
statement, but can't figure out how to use ifelse()
or sapply()
or any better way of doing it.
I have the following data:
yes_codes <- c(1,3,7)
yes_year <- 2011
df2 <- data.frame(yes_codes, yes_flags, yes_year)
codes <- c(1:10)
flag <- 'N'
year <- c(2011,2012,2011,2012,2011,2013,2014,2015,2011,2010)
df <- data.frame(codes, flag, year)
> df
codes flag year
1 1 N 2011
2 2 N 2012
3 3 N 2011
4 4 N 2012
5 5 N 2011
6 6 N 2013
7 7 N 2014
8 8 N 2015
9 9 N 2011
10 10 N 2010
> df2
yes_codes yes_flags yes_year
1 1 Y 2011
2 3 Y 2011
3 7 Y 2011
I need to match the df$code
with df$yes_codes
and set the df$flag
to 'Y' when they match. The only way I have figured out how to do this is very very obviously wrong
for(i in 1:nrow(df)) {
for(z in 1:nrow(df2)){
if(df$year[i]==2011 | df$year[i]==2012)
if(as.character(df$code)==as.character(df2$yes_code[z]))
if(df$year[i]==df2$yes_year[z])
df$flag[i] <- 'Y'
}
}
I know you're supposed to use ifelse()
to do vectorized if
statements, but this doesn't work either
ifelse(df$year==2011 | df$year==2012, ifelse(df$code==df2$yes_code,
ifelse(df$year==df2$year, df$flag <- 'Y',
df$flag <- 'N'), df$flag <- 'N'), df$flag <- 'N')
This sets EVERY flag to 'Y' or 'N' with every iteration and all I get is whatever was set last, which is usually 'N'. I really thought I had found a perfect example of why you use <-
and =
for different things, but it won't even run when I switch the <-
for =
.
EDIT:
As Sotos explained to me, ifelse()
simply returns a function so I need to set my values outside of it. My problem now is that I actually have several ifelse()
conditions that I need to check because for example I have one rule that applies to 2011 and 2012 and another that applies to 2012 and greater. Writing multiple ifelse()
statements just overwrites the output of the previous one with the else
output when done as follows:
df$flag <- ifelse(df$year==2013 & df$codes==df2$yes_code & df$year==df2$yes_year, 'Y', 'N')
df$flag <- ifelse(df$year >= 2012 & df$codes=='4', 'Y', 'N')
df$flag <- ifelse((df$year==2011 | df$year==2012) & df$code==df2$yes_code & df$year==df2$year, 'Y', 'N')
It's having to use else
that is making this so difficult, is there any other way to use a vectorized if
statement?
Upvotes: 1
Views: 777
Reputation: 1154
To summarize the info I got in this thread, the answer to my first problem was 'don't try to set values inside of an ifelse()
, use ifelse()
to return a value and set it that way.
The second problem I was having with the else
portion of my statement overwriting previous statements, the answer was maddeningly simple: just return the current value. So the following
df$flag <- ifelse((df$year==2011 | df$year==2012) & df$code==df2$yes_code &
df$year==df2$year, 'Y', 'N')
becomes this
df$flag <- ifelse((df$year==2011 | df$year==2012) & df$code==df2$yes_code &
df$year==df2$year, 'Y', df$flag)
Thanks to all who helped, this was a very difficult question to articulate.
Upvotes: 0
Reputation: 12559
Here is a solution with data.table
:
library("data.table")
dt2 <- data.table(yes_codes=c(1,3,7), yes_flags='Y',yes_year=2011)
dt <- data.table(codes=(1:10), flag='N', year=c(2011,2012,2011,2012,2011,2013,2014,2015,2011,2010))
dt[dt2, on=c(codes="yes_codes", year="yes_year"), in.df2:=i.yes_flags]
dt[year==2013 & in.df2=='Y', flag:='Y']
dt[year>=2012 & codes==4, flag:='Y']
dt[(year==2011 | year==2012) & in.df2=='Y', flag:='Y']
dt
# codes flag year in.df2
# 1: 1 Y 2011 Y
# 2: 2 N 2012 NA
# 3: 3 Y 2011 Y
# 4: 4 Y 2012 NA
# 5: 5 N 2011 NA
# 6: 6 N 2013 NA
# 7: 7 N 2014 NA
# 8: 8 N 2015 NA
# 9: 9 N 2011 NA
# 10: 10 N 2010 NA
or you can do it in one big condition:
dt[(year==2013 & in.df2=='Y') | (year>=2012 & codes==4) |
((year==2011 | year==2012) & in.df2=='Y'), flag:='Y']
you can put the first and the third condition together:
dt[((year==2011 | year==2012 | year==2013) & in.df2=='Y') | (year>=2012 & codes==4), flag:='Y']
# and shorten it:
dt[((year %in% 2011:2013) & in.df2=='Y') | (year>=2012 & codes==4), flag:='Y']
Upvotes: 1
Reputation:
df3<-merge(df, df2, by.x='codes', by.y='yes_codes',all.x = TRUE)
df3$flag<-ifelse(df3$yes_flags=="Y", "Y", "N")
df3$flag[is.na(df3$flag)]<-"N"
df<-df3[,!(names(df3) %in% names(df2))]
Upvotes: 1