Reputation:
I'm trying to write a code that will exclude certain factors from sets of data/numbers.
I have written the following:
x <- c("1407741214DAG359", "2211682828DAG359", "1304410201DAG359", "0908700465DAG36", "0909700565G379")
y <- c("1407741214DAG359", "2211682828DAG359", "1304410201DAG359", "0","0")
Here i wish to exclude the values that contain DAG36
and G379
I tried writing the following:
newdata.x <- x[ x != "DAG36", "G379" ]
However, the code only seems to exclude values that exclusively contains: DAG36 and G379 and not any value containing either DAG36
or G379
.
Would any of you be able to help me?
Upvotes: 1
Views: 57
Reputation: 7164
What you are searching for is grep()
or grepl()
. Both functions search for a pattern in a given string or vector of strings, in your case.
The pattern you are looking for is DAG36
and G379
. You can express this in regular expressions like DAG36|G379
.
grep("DAG36|G379", x)
# [1] 4 5
grepl("DAG36|G379", x)
# [1] FALSE FALSE FALSE TRUE TRUE
As you see, these two functions come down to the same thing, really, and can be used interchangeably. Now you can use indexing to replace the relevant strings with a zero:
x[ grepl("DAG36|G379", x) ] <- 0
x <- x[ grepl("DAG36|G379", x) ] # Easier version of removing relevant strings
x <- grep("DAG36|G379", x, invert = T, value = T) # More direct version
Upvotes: 3