RoyBatty
RoyBatty

Reputation: 326

Remove evey substring after a given character in a list

I have a list that has the following structure:

[1] "Atp|Barcelona|Concentration(ng/mL)|8|FALSE"

I want to extract the third element (separating by the | symbol, and removing for the given string everything that is after the ( symbol.

So I would get this character:

[1] "Concentration"

What I do is first split by the | symbol. Then, get the third element of the generated list. In order to be able to use gsub I convert to character, and then I apply gsub function, like follows.

y <- "Atp|Barcelona|Concentration(ng/mL)|8|FALSE"
y <- strsplit(y,  "\\|")
y <- y[[1]][3]
y <- as.character(y)
gsub("(.*","",y)

However, this error is prompted:

invalid regular expression '(.*', reason 'Missing ')''

Upvotes: 0

Views: 48

Answers (2)

benson23
benson23

Reputation: 19097

First of all, you don't need y <- as.character(y), since the result would already be of class "character".

Second, your problem lies in the pattern inside gsub(), where you need to escape the opening bracket. Therefore your full code should be:

y <- "Atp|Barcelona|Concentration(ng/mL)|8|FALSE"
y <- strsplit(y,  "\\|")
y <- y[[1]][3]
gsub("\\(.*","",y)

[1] "Concentration"

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521279

You may use strsplit with unlist here:

x <- "Atp|Barcelona|Concentration(ng/mL)|8|FALSE"
output <- unlist(strsplit(x, "\\|"))[3]
output

[1] "Concentration(ng/mL)"

If some inputs might have have at least two | separators, then you may first check the size of the vector output from the above before trying to access the third element.

Upvotes: 1

Related Questions