Reputation: 323
I have a bunch of urls in a column and I need to create a new variable to extract a specific unique id for each url. The unique id occurs after the equal sign. For example:
https://website.com/locationDetails.php?l=29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13
So the unique id variable would be: 29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13
I think I can do this with str_extract using the regex
data %>%
mutate(unique_id = str_extract(url, " "))
Upvotes: 1
Views: 2352
Reputation: 389175
Using str_extract
-
url <- 'https://website.com/locationDetails.php?l=29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13'
stringr::str_extract(url, '(?<==).*')
#[1] "29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13"
which is same as
stringr::str_match(url, '=(.*)')[, 2]
Upvotes: 3
Reputation: 26238
=
and capture everything in a group after that.str <- 'https://website.com/locationDetails.php?l=29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13'
gsub('^[^\\=]*\\=(.*)$', '\\1', str)
#> [1] "29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13"
Created on 2021-05-31 by the reprex package (v2.0.0)
Upvotes: 0
Reputation: 522396
Assuming all URLs would only ever have one query parameters, you may use sub
here:
url <- "https://website.com/locationDetails.php?l=29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13"
param <- sub("^.*=", "", url)
param
[1] "29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13"
Assuming there could be multiple query parameters, and you want the one labelled as l
, then we can use sub
with a capture group:
url <- "https://website.com/locationDetails.php?l=29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13"
param <- sub("^.*\\bl=(.*?)(?=&|$)", "\\1", url, perl=TRUE)
param
[1] "29A5CDCA-7D0F-4FAA-906C-00DA90EBFD13"
Upvotes: 1