HernanLG
HernanLG

Reputation: 666

Write a code to calculate scores from and add them to a data.frame

I have a data.frame with 6 columns. The first is for subjects, the second for blocks in an experiment, and columns 3,4 and 5 are values I need to calculate a binary score (0 or 1), that I want to add in the sixth column (that's why now, it's full of 0s).

head(kfdblock3to9)
    subject time        gr       ugr      sdugr IL
40002.3   40002    3 0.4475618 0.3706000 0.02994533  0
40002.4   40002    4 0.4361786 0.3901111 0.01846110  0
40002.5   40002    5 0.4279880 0.4550000 0.02811839  0
40002.6   40002    6 0.4313647 0.4134444 0.04352974  0
40002.7   40002    7 0.4420889 0.4394286 0.02883143  0
40002.8   40002    8 0.4325227 0.3960000 0.06559222  0

I'm trying to do this with a for loop, but I'm a beginner in R and I'm having difficulties with this. The scoring formula I'm trying to implement is one where: If the value in column 3 ($gr) is less that the difference between the value in column 4 ($ugr) and .35 times the value in column 5 ($sdugr), then the subject receives a 1, otherwise a 0.

What I've tried so far is:

for (i in kfdblock3to9$subject) {
     if (kfdblock3to9$gr<(kfdblock3to9$ugr-(.35*kfdblock3to9$sdugr))) 
                 kfdblock3to9$IL=1
         else kfdblock3to9$IL=0
    }

This gives me 50 warnings, all saying: "the condition has length > 1 and only the first element will be used"

I suppose I'm doing something wrong with the indexes then, but I haven't been able to figure it out. Any help is much appreciated.

Upvotes: 1

Views: 165

Answers (4)

Alex
Alex

Reputation: 4180

You shouldn't use a loop in this case. Whenever you use a loop in the future, you need to use indices :

for (i in 1:length(kfdblock3to9$subject)) {
     if (kfdblock3to9[i,"gr"] < (kfdblock3to9[i, "ugr"] - .35 * kfdblock3to9[i, "sdugr"])) 
                 kfdblock3to9[i,"IL"]=1
     else  kfdblock3to9[i,"IL"]=0
}


kfdblock3to9
     subject time        gr       ugr      sdugr IL
40002.3   40002    3 0.4475618 0.3706000 0.02994533  0
40002.4   40002    4 0.4361786 0.3901111 0.01846110  0
40002.5   40002    5 0.4279880 0.4550000 0.02811839  1
40002.6   40002    6 0.4313647 0.4134444 0.04352974  0
40002.7   40002    7 0.4420889 0.4394286 0.02883143  0
40002.8   40002    8 0.4325227 0.3960000 0.06559222  0

Upvotes: 0

Matthieu Dubois
Matthieu Dubois

Reputation: 327

What you want is a logical test. You can thus avoid the use of the loop, and even ifelse, and simply do:

kfdblock3to9$IL <- with(kfdblock3to9, gr < (ugr-0.35*sdugr))

The IL column will include TRUE of FALSE, instead of 1 or 0. If you prefer having integers, you can do:

kfdblock3to9$IL <- as.integer(with(kfdblock3to9, gr < (ugr-0.35*sdugr)))

Upvotes: 2

sgibb
sgibb

Reputation: 25736

To solve your problem I would suggest something like this:

kfdblock3to9[, "IL"] <- ifelse(kfdblock3to9$gr < (kfdblock3to9$ugr-(0.35*kfdblock3to9$sdugr)), 1, 0);

(A vectorized approach is mostly faster than a loop.)

Your loop is wrong because you don't respect your index i. You have to use i to access the row in the loop:

for (i in seq(along=kfdblock3to9)) {
    cat("row:", i, kfdblock3to9[i, "subject"], "\n");
}

Upvotes: 2

Joris Meys
Joris Meys

Reputation: 108523

Take a look at within and ifelse :

kfdblock3to9 <- 
within(kfdblock3to9,
  IL <- ifelse( gr < ugr - 0.35 * dugr, 1, 0)
)

within() isn't really that necessary, but it keeps your code a whole lot more readible and easier to understand.

Why does it go wrong? That's because your condition is vectorized : try

kfdblock3to9$gr<(kfdblock3to9$ugr-(.35*kfdblock3to9$sdugr))

and you will see it returns a logical vector. Now an if() clause can only deal with one boolean value at a time. If you have a vectorized result, you need a vectorized solution and that is ifelse()

Upvotes: 2

Related Questions