Revan
Revan

Reputation: 2322

remove elements from a vector of string (with exact names and starts of the names)

I have a long character vector of protein names which I want to reduce.

I want to remove from the vector all entries that are == "5-FCL-like_protein" and all entries that start with "CBSS-"

For the first problem, I can just use %in%

remove <- c("5-FCL-like_protein")
vec[! vec %in% remove]

But how can I include the entries that start with "CBSS-" as well?

Thank you.

Upvotes: 0

Views: 216

Answers (2)

akrun
akrun

Reputation: 887048

Or we can use this within grep

grep("^(CBSS|5-FCL-like_protein$)", vec, value = TRUE, invert = TRUE)
#[1] "Protein1" "Protein2"

data

vec <- c("Protein1","Protein2", "CBSS-Protein 2", "5-FCL-like_protein")

Upvotes: 1

Mike H.
Mike H.

Reputation: 14360

You can use two conditions in your subset. The first one is very similar to your %in% except I use == instead just because of personal preference. If you have multiple strings you want to exclude you can go back to %in%. The second one uses grepl to match "CBSS-" at the beginning of the string.

vec <- c("Protein1","Protein2", "CBSS-Protein 2", "5-FCL-like_protein")
vec[!vec == "5-FCL-like_protein" & !grepl("^CBSS-", vec)]
#[1] "Protein1" "Protein2"

Upvotes: 2

Related Questions