user7777508
user7777508

Reputation: 101

Using ifelse on list

I've just transitioned to using R from SAS and I'm working with a very large data set (half a million observations and 20 thousand variables) that needs quite a bit of recoding. I imagine this is a pretty basic question, but I'm still learning so I'd really appreciate any guidance!

Many of the variables have three instances and each instance has multiple arrays. For this problem, I am using the "History of Father's Illness." There are many illnesses included, but I am primarily interested in CAD (coded as "1").

An example of how the data looks:

n_20107_0_0   n_20107_0_1     n_20107_0_2
    NA             NA              NA
    7             1                8
    4             6                1             

I've only included 3 arrays here, but in reality there are close to 20. I did a bit of research and determined that the most efficient way to do this would be to create a list with the variables and then use lapply. This is what I have attempted:

 FatherDisease1 <- paste("n_20107_0_", 0:3, sep = "")
lapply(FatherDisease1, transform, FatherCAD_0_0 = ifelse(FatherDisease1 == 1, 1, 0))

I don't quite get the results I am looking for when I do this.

 n_20107_0_0   n_20107_0_1     n_20107_0_2  FatherCAD_0_0
   NA             NA              NA             0
    7             1                8             0
    4             6                1             0

What I would like to do is go through all of the 3 instances and if the person had answered 1, then for "FatherCAD_0_0" to equal 1, if not then "FatherCAD_0_0" equals 0, but I only ever end up with 0's. As for the NA's I would like for them to stay as NAs. This is what I would like it to look like:

n_20107_0_0   n_20107_0_1     n_20107_0_2  FatherCAD_0_0
   NA             NA              NA            NA
    7             1                8             1
    4             6                1             1

I've figured out how to do this the "long" way (30+ lines of code -_-) but am trying to get better at writing more elegant and efficient code. Any help would be greatly appreciated!!

Upvotes: 0

Views: 3316

Answers (1)

Mike H.
Mike H.

Reputation: 14370

Assuming your data is in a data.frame you could use apply to loop over each row and check if any of the columns you are interested have a 1:

FatherDisease1 <- paste("n_20107_0_", 0:2, sep = "")
df$FatherCAD_0_0 <- apply(df, 1, function(x) as.integer(any(x[FatherDisease1] == 1)))

df
#  n_20107_0_0 n_20107_0_1 n_20107_0_2 FatherCAD_0_0
#1          NA          NA          NA            NA
#2           7           1           8             1
#3           4           6           1             1

Data:

df <- structure(list(n_20107_0_0 = c(NA, 7L, 4L), n_20107_0_1 = c(NA, 
1L, 6L), n_20107_0_2 = c(NA, 8L, 1L)), .Names = c("n_20107_0_0", 
"n_20107_0_1", "n_20107_0_2"), row.names = c(NA, -3L), class = "data.frame")

Upvotes: 1

Related Questions