Reputation:
The R gsub() syntax is so difficult to me ! Could you, please, help me to extract, for example, "DA VINCI" from "16. DA VINCI_RETOUR" ?
I've already tried gsub("_.+$", "", x)
but it just removes what is after the "_" and I would like also to remove what is before the ". " !
Thank you so much for your help !
Upvotes: 0
Views: 310
Reputation: 39657
.*
takes everything at the beginning, \\.
matches .
, (.*) matches everything until and stores it in \\1
_
and .*
removes the rest.
x <- "16. DA VINCI_RETOUR"
sub(".*\\. (.*)_.*", "\\1", x)
#[1] "DA VINCI"
x <- "7. TILLEUL_RETOUR"
sub(".*\\. (.*)_.*", "\\1", x)
#[1] "TILLEUL"
Upvotes: 2
Reputation: 13309
An alternative that uses strsplit
:
gsub("\\d+\\.\\s","",
strsplit(the_string,"_")[[1]][1])
[1] "DA VINCI"
Data:
the_string <- "16. DA VINCI_RETOUR"
Upvotes: 1
Reputation: 887028
Here is one option with capture group to match the pattern of word (\\w+
) followed by space and another word as a group and replace with the backreference of the capture group (\\1
)
sub("^\\d+\\.\\s+(\\w+\\s+\\w+)_.*", "\\1", str1)
str1 <- "16. DA VINCI_RETOUR"
Upvotes: 2