Reputation: 1751
I've searched extensively on stack overflow without finding an answer to the following question: Im looking for a function that lets me test if some columns contain any of the specified strings.
# I'm looking for heart attacks
infarction <- c("b", "c")
# I'm also looking for strokes
stroke <- c("h", "i")
#sample data set
set.seed(1234)
dat <- data.frame(A = sample(letters[1:9],10,TRUE),
B = sample(letters[1:9],10,TRUE),
C = sample(letters[1:9],10,TRUE),
D = sample(letters[1:9],10,TRUE),
DATE = sample.int(10, size = 10, replace = FALSE))
# I've tried many things. Among them:
# first one using the dplyr package.
infarction = ifelse( (infarction %in% dat[,c("A", "B", "C", "D")]), DATE, NA))
#excluded a few rows from the mutate...
#I've also tried
grep(paste(infarction,collapse="|"), dat[,1:4], value=TRUE), DATE, NA))
To sum up. I can get it to work if i only check in one column or only look after one of the strings. But I need to check if any of the strings is contained in any of the given columns, and in that case return the date value as a new variable.
Any help is greatly appreciated.
Upvotes: 1
Views: 1765
Reputation: 24535
Following may help:
> apply(dat, 1, function(x) ifelse(infarction %in% x[1:5],x[5],NA ))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] " 6" NA " 3" NA " 2" NA " 7" NA " 1" NA
[2,] " 6" "10" " 3" NA " 2" NA " 7" " 9" NA " 4"
In data.frame form:
> as.data.frame(t(apply(dat, 1, function(x) ifelse(infarction %in% x[1:5],x[5],NA ))))
V1 V2
1 6 6
2 <NA> 10
3 3 3
4 <NA> <NA>
5 2 2
6 <NA> <NA>
7 7 7
8 <NA> 9
9 1 <NA>
10 <NA> 4
For stroke:
> as.data.frame(t(apply(dat, 1, function(x) ifelse(stroke %in% x[1:5],x[5],NA ))))
V1 V2
1 <NA> <NA>
2 <NA> <NA>
3 <NA> <NA>
4 <NA> 5
5 2 <NA>
6 8 <NA>
7 <NA> <NA>
8 <NA> 9
9 1 1
10 4 <NA>
Upvotes: 1
Reputation: 14346
It depends on what your desired output is. @Pop's answer is perfectly good I just want to add that your grep
approach was on the right track except you can't give it a data frame in the x
argument. You have to perform it column-wise. This is because grep
will coerce the x
argument to a character and you can see if you type as.character(dat[, 1:4])
that you are grep
ing the wrong string. Instead, try for example
grepres <- lapply(dat[1:4], grep, pattern = paste(infarction,collapse="|"))
If you want to find which rows don't contain the pattern you can do for example
idx <- setdiff(1:nrow(dat), unlist(grepres))
and then
result <- rep(NA, nrow(dat))
result[idx] <- dat$DATE[idx]
Upvotes: 2
Reputation: 12401
If you just want to know if the columns contain the letters, you can apply to each column of your data.frame
the %in%
function:
apply(dat[,-5], 2, '%in%', x=stroke)
If you want to know the position of each letter of stroke
in your columns, you can use the which
function and apply it to each column:
apply(dat[,-5], 2, function(table,x) which(x==table), x=stroke[1])
apply(dat[,-5], 2, function(table,x) which(x==table), x=stroke[2])
or
sapply(stroke, function(y) apply(dat, 2, function(table,x) which(x==table), x=y), simplify=FALSE)
If you want to know the position of all letters of stroke, do this:
apply(dat[,-5], 2, function(x) unlist(sapply(stroke , function(i) which( i == x))))
Upvotes: 3