jamzsabb
jamzsabb

Reputation: 1154

Vectorized 'if' statement without 'ifelse'

I have figured out a very inefficient way of using vectors in an if statement, but can't figure out how to use ifelse() or sapply() or any better way of doing it.

I have the following data:

yes_codes <- c(1,3,7)
yes_year <- 2011
df2 <- data.frame(yes_codes, yes_flags, yes_year)
codes <- c(1:10)
flag <- 'N'
year <- c(2011,2012,2011,2012,2011,2013,2014,2015,2011,2010)
df <- data.frame(codes, flag, year)

> df
   codes flag year
1      1    N 2011
2      2    N 2012
3      3    N 2011
4      4    N 2012
5      5    N 2011
6      6    N 2013
7      7    N 2014
8      8    N 2015
9      9    N 2011
10    10    N 2010
> df2
  yes_codes yes_flags yes_year
1         1         Y     2011
2         3         Y     2011
3         7         Y     2011

I need to match the df$code with df$yes_codes and set the df$flag to 'Y' when they match. The only way I have figured out how to do this is very very obviously wrong

for(i in 1:nrow(df)) {
  for(z in 1:nrow(df2)){
    if(df$year[i]==2011 | df$year[i]==2012)
      if(as.character(df$code)==as.character(df2$yes_code[z]))
        if(df$year[i]==df2$yes_year[z])
          df$flag[i] <- 'Y'
  }
}

I know you're supposed to use ifelse() to do vectorized if statements, but this doesn't work either

ifelse(df$year==2011 | df$year==2012, ifelse(df$code==df2$yes_code, 
ifelse(df$year==df2$year, df$flag <- 'Y',
            df$flag <- 'N'), df$flag <- 'N'), df$flag <- 'N')

This sets EVERY flag to 'Y' or 'N' with every iteration and all I get is whatever was set last, which is usually 'N'. I really thought I had found a perfect example of why you use <- and = for different things, but it won't even run when I switch the <- for =.

EDIT:
As Sotos explained to me, ifelse() simply returns a function so I need to set my values outside of it. My problem now is that I actually have several ifelse() conditions that I need to check because for example I have one rule that applies to 2011 and 2012 and another that applies to 2012 and greater. Writing multiple ifelse() statements just overwrites the output of the previous one with the else output when done as follows:

df$flag <- ifelse(df$year==2013 & df$codes==df2$yes_code & df$year==df2$yes_year, 'Y', 'N')
df$flag <- ifelse(df$year >= 2012 & df$codes=='4', 'Y', 'N')
df$flag <- ifelse((df$year==2011 | df$year==2012) & df$code==df2$yes_code & df$year==df2$year, 'Y', 'N')

It's having to use else that is making this so difficult, is there any other way to use a vectorized if statement?

Upvotes: 1

Views: 777

Answers (3)

jamzsabb
jamzsabb

Reputation: 1154

To summarize the info I got in this thread, the answer to my first problem was 'don't try to set values inside of an ifelse(), use ifelse() to return a value and set it that way.

The second problem I was having with the else portion of my statement overwriting previous statements, the answer was maddeningly simple: just return the current value. So the following

df$flag <- ifelse((df$year==2011 | df$year==2012) & df$code==df2$yes_code &
df$year==df2$year, 'Y', 'N')

becomes this

df$flag <- ifelse((df$year==2011 | df$year==2012) & df$code==df2$yes_code &
df$year==df2$year, 'Y', df$flag)

Thanks to all who helped, this was a very difficult question to articulate.

Upvotes: 0

jogo
jogo

Reputation: 12559

Here is a solution with data.table:

library("data.table")
dt2 <- data.table(yes_codes=c(1,3,7), yes_flags='Y',yes_year=2011)
dt  <- data.table(codes=(1:10), flag='N', year=c(2011,2012,2011,2012,2011,2013,2014,2015,2011,2010))

dt[dt2, on=c(codes="yes_codes", year="yes_year"), in.df2:=i.yes_flags]

dt[year==2013 & in.df2=='Y', flag:='Y']
dt[year>=2012 & codes==4, flag:='Y']
dt[(year==2011 | year==2012) & in.df2=='Y', flag:='Y']
dt
#    codes flag year in.df2
# 1:     1    Y 2011      Y
# 2:     2    N 2012     NA
# 3:     3    Y 2011      Y
# 4:     4    Y 2012     NA
# 5:     5    N 2011     NA
# 6:     6    N 2013     NA
# 7:     7    N 2014     NA
# 8:     8    N 2015     NA
# 9:     9    N 2011     NA
# 10:    10    N 2010     NA

or you can do it in one big condition:

dt[(year==2013 & in.df2=='Y') | (year>=2012 & codes==4) | 
               ((year==2011 | year==2012) & in.df2=='Y'), flag:='Y']

you can put the first and the third condition together:

dt[((year==2011 | year==2012 | year==2013) & in.df2=='Y') | (year>=2012 & codes==4), flag:='Y']
# and shorten it:
dt[((year %in% 2011:2013) & in.df2=='Y') | (year>=2012 & codes==4), flag:='Y']

Upvotes: 1

user5099519
user5099519

Reputation:

df3<-merge(df, df2, by.x='codes', by.y='yes_codes',all.x = TRUE)
df3$flag<-ifelse(df3$yes_flags=="Y", "Y", "N")
df3$flag[is.na(df3$flag)]<-"N"
df<-df3[,!(names(df3) %in% names(df2))]

Upvotes: 1

Related Questions