dano_
dano_

Reputation: 323

R extract everything after = regex

I have a bunch of urls in a column and I need to create a new variable to extract a specific unique id for each url. The unique id occurs after the equal sign. For example:

https://website.com/locationDetails.php?l=29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13

So the unique id variable would be: 29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13

I think I can do this with str_extract using the regex

data %>% 
  mutate(unique_id = str_extract(url, " ")) 

Upvotes: 1

Views: 2352

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389175

Using str_extract -

url <- 'https://website.com/locationDetails.php?l=29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13'
stringr::str_extract(url, '(?<==).*')
#[1] "29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13"

which is same as

stringr::str_match(url, '=(.*)')[, 2]

Upvotes: 3

AnilGoyal
AnilGoyal

Reputation: 26238

  • match everything upto first = and capture everything in a group after that.
str <- 'https://website.com/locationDetails.php?l=29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13'

gsub('^[^\\=]*\\=(.*)$', '\\1', str)
#> [1] "29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13"

Created on 2021-05-31 by the reprex package (v2.0.0)

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522396

Assuming all URLs would only ever have one query parameters, you may use sub here:

url <- "https://website.com/locationDetails.php?l=29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13"
param <- sub("^.*=", "", url)
param

[1] "29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13"

Assuming there could be multiple query parameters, and you want the one labelled as l, then we can use sub with a capture group:

url <- "https://website.com/locationDetails.php?l=29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13"
param <- sub("^.*\\bl=(.*?)(?=&|$)", "\\1", url, perl=TRUE)
param

[1] "29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13"

Upvotes: 1

Related Questions