Pcrainy Huang
Pcrainy Huang

Reputation: 13

find first value and based on the result, create new variable

I had a data frame with 184 obs. of 5 variables:

'data.frame':   184 obs. of  5 variables:
     $ Cat     : Factor w/ 10 levels "99-001","99-002",..: 1 1 1 1 1 1 1 1 1 1 ...
     $ No      : int  1 1 1 1 1 1 1 1 1 1 ...
     $ ehs     : int  0 0 0 0 0 0 0 0 0 0 ...
     $ Onset   : int  0 0 0 0 0 0 0 9 9 9 ...
     $ STARTING: Factor w/ 149 levels "1:37PM1","1:42PM1",..: 3 4 5 63 64 65 66 67... 

The data frame comes from a repeated measurement study, that means each case was measured several times:

Now I want to create a new variable (provoke) by judging the onset situation of each case. If the onset is "0" first, than the new variable (provoke) will be coded as "0", otherwise "1".
The R script of mine :

no1 <- seq[seq$No == 1, ]
if (no1[1,4]==0) {no1$provoke =0} else {no1$provoke =1}
no2 <- seq[seq$No == 2, ]
if (no2[1,4]==0) {no2$provoke = 0} else {no2$provoke = 1}    

For the large case number, I intend to write a loop to finish the task

 for (i in 1:10) {    
 noi <- seq[seq$No == i, ]    
 if (noi[1,4]==0) {    
 noi$provoke = 0}     
 else {noi$provoke = 1}    
}

but the loop seems not functioned. Could you please help me find out the bug or point out my mistake?

Upvotes: 0

Views: 221

Answers (1)

Roman Luštrik
Roman Luštrik

Reputation: 70623

seq is a really bad name to choose for a data.frame. Let's call it xy for this one example.

xy <- data.frame(case = rep(1:5, each = 10), oldvar = rbinom(50, size = 1, prob = 0.5))

xy.split <- split(xy, f = xy$case)

manipulateXY <- function(x) {
  if (x[1, "oldvar"] == 0) {
    x$newvar <- 0
  } else {
    x$newvar <- 1
  }
  x
}

xy.newvar <- lapply(xy.split, FUN = manipulateXY)

xy.new <- do.call("rbind", xy.newvar)
xy.new

Another way of going about this would be as the following. This one assumes the data is ordered by case.

# find first occurrence 
zero.or.not <- do.call("rbind", lapply(xy.split, FUN = function(x) x[1, ]))$oldvar

# count number of rows
num.rows <- unlist(lapply(xy.split, FUN = nrow))

xy.new$newvar2 <- rep(zero.or.not, times = num.rows)
xy.new

Upvotes: 1

Related Questions