user12052836
user12052836

Reputation:

R - gsub() : trouble when trying to extract a string between ". " and "_"

The R gsub() syntax is so difficult to me ! Could you, please, help me to extract, for example, "DA VINCI" from "16. DA VINCI_RETOUR" ?

I've already tried gsub("_.+$", "", x) but it just removes what is after the "_" and I would like also to remove what is before the ". " !

Thank you so much for your help !

Upvotes: 0

Views: 310

Answers (3)

GKi
GKi

Reputation: 39657

.* takes everything at the beginning, \\. matches ., (.*) matches everything until and stores it in \\1 _ and .* removes the rest.

x  <- "16. DA VINCI_RETOUR"
sub(".*\\. (.*)_.*", "\\1", x)
#[1] "DA VINCI"

x  <- "7. TILLEUL_RETOUR"
sub(".*\\. (.*)_.*", "\\1", x)
#[1] "TILLEUL"

Upvotes: 2

NelsonGon
NelsonGon

Reputation: 13309

An alternative that uses strsplit:

gsub("\\d+\\.\\s","",
      strsplit(the_string,"_")[[1]][1])
[1] "DA VINCI"

Data:

the_string <- "16. DA VINCI_RETOUR"

Upvotes: 1

akrun
akrun

Reputation: 887028

Here is one option with capture group to match the pattern of word (\\w+) followed by space and another word as a group and replace with the backreference of the capture group (\\1)

sub("^\\d+\\.\\s+(\\w+\\s+\\w+)_.*", "\\1", str1)

data

str1 <- "16. DA VINCI_RETOUR" 

Upvotes: 2

Related Questions