lolo
lolo

Reputation: 646

Regular Expression - Extract Word in r

how can I extract MLA723950998 from this string?

"https://auto.mercadolibre.com.ar/MLA-723950998-peugeot-208-0km-16-active-plan-100-financiado-darc-_JM"

I was able to manage to extract MLA.

gsub('.*(M\\w+).*', '\\1', "https://auto.mercadolibre.com.ar/MLA-723950998-peugeot-208-0km-16-active-plan-100-financiado-darc-_JM")

MLA

Upvotes: 0

Views: 74

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

You may use

.*/(M\w+)-(\d+).*

and replace with \1\2.

Details

  • .*/ - any 0+ chars, as many as possible, up to and including the last / in the string
  • (M\w+) - Group 1 (later referred to with \1 placeholder from the replacement pattern): M and 1+ letters, digits or/and _
  • - - a hyphen
  • (\d+) - Group 2 (later referred to with \2 placeholder from the replacement pattern): one or more digits
  • .* - the rest of the string.

See the regex demo

See the R demo:

x <- "https://auto.mercadolibre.com.ar/MLA-723950998-peugeot-208-0km-16-active-plan-100-financiado-darc-_JM"
gsub('.*/(M\\w+)-(\\d+).*', '\\1\\2', x)
# => [1] "MLA723950998"

Upvotes: 1

Manuel Bickel
Manuel Bickel

Reputation: 2206

Maybe this solution works for you:

library(stringi)
x = "https://auto.mercadolibre.com.ar/MLA-723950998-peugeot-208-0km-16-active-plan-100-financiado-darc-_JM"
stri_extract_last_regex(x, "(?<=/)([A-Za-z]+.\\d+)(?=[^/]+$)")
[1] "MLA-723950998"

(i) The first lookbehind finds the position of a slash, (ii) which is then followed by letters, 1 x any character and digits, (iii) which by the lookahead may only be followed by anything but a slash.

Upvotes: 1

Related Questions