Lennon Lee
Lennon Lee

Reputation: 244

extract multiple string pattern and return to a single value in r

I have a string:

test<-c("Compound.name:Cyclohexylamine;CAS.ID:108-91-8;HMDB.ID:HMDB31404;KEGG.ID:C00571;Lab.ID:shen_4881_HMDB31404;Adduct:(M+H)+;mz.error:0.5703867;mz.match.score:0.9997398;RT.error:NA;RT.match.score:NA;CE:Unknown_1;SS:0.52575;Total.score:0.7034962;Database:hmdbDatabase0.0.1
120 Levels: Compound.name:3-Dehydroxycarnitine;CAS.ID:;HMDB.ID:HMDB06831;KEGG.ID:C05543\t;Lab.ID:shen_3269_HMDB06831;Adduct:(M+H)+;mz.error:0.7554105;mz.match.score:0.9995436;RT.error:NA;RT.match.score:NA;CE:Unknown_1;SS:0.6706675;Total.score:0.793996;Database:hmdbDatabase0.0.1"

I used str_extract to extract multiple patterns from the string:

str_extract(test,pattern = c("Compound\\.name(.*?);","HMDB\\.ID(.*?);","KEGG\\.ID(.*?);","mz\\.match\\.score(.*?);"))

[1] "Compound.name:Cyclohexylamine;" "HMDB.ID:HMDB31404;"             "KEGG.ID:C00571;"               
[4] "mz.match.score:0.9997398;"

I would like the result to return to a single value, how to do that? like:

[1] "Compound.name:Cyclohexylamine;HMDB.ID:HMDB31404;KEGG.ID:C00571;mz.match.score:0.9997398;"

Upvotes: 1

Views: 786

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389175

You can paste the string together with empty collapse argument.

library(stringr)
paste0(str_extract(test, pattern = c("Compound\\.name(.*?);","HMDB\\.ID(.*?);",
       "KEGG\\.ID(.*?);","mz\\.match\\.score(.*?);")), collapse = "")

#[1]Compound.name:Cyclohexylamine;HMDB.ID:HMDB31404;KEGG.ID:C00571;mz.match.score:0.9997398;"

Or since you are already using stringr, you can also use str_c instead of paste0.


We can use it as a function

apply_fun <- function(x) {
    paste0(str_extract(test, pattern = c("Compound\\.name(.*?);","HMDB\\.ID(.*?);",
           "KEGG\\.ID(.*?);","mz\\.match\\.score(.*?);")), collapse = "")
}

and then apply for each value in the column using sapply

sapply(df$column_name, apply_fun)

Upvotes: 2

Related Questions