Reputation: 2322
I have a long character vector of protein names which I want to reduce.
I want to remove from the vector all entries that are == "5-FCL-like_protein" and all entries that start with "CBSS-"
For the first problem, I can just use %in%
remove <- c("5-FCL-like_protein")
vec[! vec %in% remove]
But how can I include the entries that start with "CBSS-" as well?
Thank you.
Upvotes: 0
Views: 216
Reputation: 887048
Or we can use this within grep
grep("^(CBSS|5-FCL-like_protein$)", vec, value = TRUE, invert = TRUE)
#[1] "Protein1" "Protein2"
vec <- c("Protein1","Protein2", "CBSS-Protein 2", "5-FCL-like_protein")
Upvotes: 1
Reputation: 14360
You can use two conditions in your subset. The first one is very similar to your %in%
except I use ==
instead just because of personal preference. If you have multiple strings you want to exclude you can go back to %in%
. The second one uses grepl
to match "CBSS-"
at the beginning of the string.
vec <- c("Protein1","Protein2", "CBSS-Protein 2", "5-FCL-like_protein")
vec[!vec == "5-FCL-like_protein" & !grepl("^CBSS-", vec)]
#[1] "Protein1" "Protein2"
Upvotes: 2