Reputation: 71
Possibly a really simple problem, but my Stata-ingrained brain just can't figure this one out.
I am trying to generate a single 'case-status' variable in R that uses conditional input from multiple variables in a df. I can get it to work conditional on one variable, but am struggling to find a method that includes all variables.
The data looks similar to this:
id var1 var2 var3 .....
1 X Y <NA>
2 Y <NA> <NA>
3 <NA> X X
I can use case <- rep(NA, nrow(df))
followed by case[df$var1 == "X"] <- 1
to return this output:
head(case)
[1] 1 NA NA
But what I really want to know is if there are any instances of X in any of the var variables, so output that looks like this:
head(case)
[1] 1 NA 1
So how can I change case[df$var1 == "X"] <- 1
to loop over all 'var' variables (in reality there are about 400 rather than 3)?
Upvotes: 2
Views: 174
Reputation: 23788
You could try
case <- +!!rowSums(df=="X", na.rm=TRUE)
case[case==0] <- NA
#> case
#[1] 1 NA 1
data
df <- structure(list(id = 1:3, var1 = structure(c(1L, 2L, 2L), .Label =
c("X", "Y"), class = "factor"), var2 = structure(c(1L, NA, 1L),
.Label = "Y", class = "factor"), var3 = structure(c(NA, NA, 1L),
.Label = "X", class = "factor")), .Names = c("id", "var1", "var2", "var3"),
class = "data.frame", row.names = c(NA, -3L))
Upvotes: 2
Reputation: 10401
What about this?
myData <- data.frame(id=1:3, var1=c("X", "Y", NA),
var2=c("Y", NA, "X"), var3=c(NA, NA, "X"),
stringsAsFactors=F)
as.numeric(rowSums(myData[2:4] == "X", na.rm=TRUE) > 0)
Result:
[1] 1 0 1
To get the exact same results as you did (having NA where no "X" is present but at least one NA is present), try this:
ifelse(rowSums(myData[2:4] == "X", na.rm=TRUE) > 0, 1,
ifelse(rowSums(is.na(myData[2:4])) > 0, NA, 0))
Result:
[1] 1 NA 1
Upvotes: 1
Reputation: 38500
To get a column that finds if any column in a row has an "X", one method using any
is as follows:
# set up example data
df <- data.frame(id=1:3, var1=c("X", "Y", NA), var2=c("Y", NA, "X"), var1=c(NA, NA, "X"),
stringsAsFactors=F)
df$newVec <- as.integer(apply(df[,-1], 1, function(i) any(i == "X", na.rm=T)))
This returns 1s and 0s, if instead you want NAs where every value of the row is NA, use
df$newVec <- as.integer(apply(df[,-1], 1, function(i) any(i == "X")))
Here is one way in base R to replace all X values with 1s
replacement using a for
loop
for(i in 2:length(df)) df[df[, i] == "X" & !is.na(df[, i]), i] <- 1
You have to include !is.na
in order to ignore the missing values. This should be pretty fast, since its replacement in place.
If you goal is to indicate whether or not a variable has an X, you can use any
and sapply
:
sapply(df, function(i) any(i == "X"))
Upvotes: 1