Marc van der Peet
Marc van der Peet

Reputation: 111

Missing value error in if statement

I have a data.frame like this

home <- c("MANU","CHELSEA")
away <- c("SWANSEA", "LIVERPOO")
GH <- c(3,4)
GA <- c(2,1)

df <- data.frame(home, away, GH, GA)

I would like add a column in the df which fills a point column based on the result:

calc <- function(df) {

 df$POINTS <- 0

 for(i in 1:nrow(df))

  if(df$GA[i] > df$GH[i]) {
    df$POINTS[i] <- 0.11
  }
  else {
    df$POINTS[i] <- 0.22
    print("a")
  }

}

This however gives me this

 > df
 home     away GH GA POINTS
 1    MANU  SWANSEA  3  2   0.00
 2 CHELSEA LIVERPOO  4  1   0.11

Why arent the points of the first records 0.11?

Upvotes: 1

Views: 60

Answers (3)

user3793533
user3793533

Reputation:

I would strongly recommend that data.table is used, instead of data.frame. Data table is more readable, has better support for rules-based data manipulation, and is also much quicker should your datasets grow.

Here's how you could solve it:

library(data.table)

home <- c("MANU","CHELSEA")
away <- c("SWANSEA", "LIVERPOO")
GH <- c(3,1)
GA <- c(2,3)

dt <- data.table(home, away, GH, GA)
dt[, POINTS:=ifelse(GH>GA, 0.22, 0.11) ]

The first line sets up the data table:

      home     away GH GA
1:    MANU  SWANSEA  3  2
2: CHELSEA LIVERPOO  1  3

And the second adds in your ruleset:

> dt
      home     away GH GA POINTS
1:    MANU  SWANSEA  3  2   0.22
2: CHELSEA LIVERPOO  1  3   0.11

I also corrected the bug of Chelsea actually winning a soccer game. Seems unlikely these days.

Cheers

UPDATE after comment

Aha. It's basically a matter of personal preferences. As long as you can establish a clear ruleset, there are many ways to code it. Some people like compact code, I tend to prefer human readability.

Thus you could do it like this:

dt[GH>GA, comment := "home victory"] 
dt[GH<GA, comment := "away victory"] 
dt[GH==GA, comment := "draw"] 

or like this:

dt[, home.points:=ifelse(GH>GA, 3, 0) + ifelse(GH==GA, 1, 0) + ifelse(GH<GA, 0, 0) ]

Check out any tutorial for data.table and you'll easily see how flexible it is for cases like this.

Upvotes: 2

etienne
etienne

Reputation: 3678

If you really want to use a function and a for loop you could do this :

calc<-function(df){
    for(i in 1:nrow(df)){ # brackets after the for
        if(df$GA[i] > df$GH[i]) { # no need to initialize POINTS
            df$POINTS[i] <- 0.11} else {
                df$POINTS[i] <- 0.22
                print("a")
            }
    }
    return(df) # so that the function "returns" something
}

you can then do df<-calc(df) and df will have the new column with good values.

I would however recommend using ifelse : df$POINTS<-ifelse(df$GA>df$GH,0.11,0.22)

You can of course combine multiple ifelse statements. The first argument is the test, the second the value if the test is TRUE, the last the value if the test is FALSE.

Example of several ifelse :

ifelse(df$home=='MANU',0.3,ifelse(df$GA>df$GH,0.11,0.22))
# [1] 0.30 0.22 # as expected

Upvotes: 0

akrun
akrun

Reputation: 887118

We don't need a loop for this

df$POINTS <- c(0.22, 0.11)[(df$GA>df$GH)+1L]

Or we can use ifelse as well.

Upvotes: 1

Related Questions