Reputation: 10163
dput(mydf)
structure(list(urls = c("/players/a/abdulma02.html",
"/players/a/abdulta01.html",
"/players/a/abdursh01.html", "/players/a/alexaco01.html", "/players/a/alexaco02.html"
), names = c("Mahmoud Abdul-Rauf", "Tariq Abdul-Wahad", "Shareef Abdur-Rahim",
"Cory Alexander", "Courtney Alexander")), row.names = c(NA, 5L
), class = "data.frame")
head(mydf)
urls names
1 /players/a/abdulma02.html Mahmoud Abdul-Rauf
2 /players/a/abdulta01.html Tariq Abdul-Wahad
3 /players/a/abdursh01.html Shareef Abdur-Rahim
4 /players/a/alexaco01.html Cory Alexander
5 /players/a/alexaco02.html Courtney Alexander
My problem is simple - I would like to extract the part of the url before the html (abdulma02, abdulta01, etc.). The data is formatted such that the ending will always be .html
, and the start will always be /players/{single letter}/{what i want}.html
I have taken a stab at this using the new urltools
library to no success (tried their urltools::suffix_extract()
function). Any help here is appreciated.
Upvotes: 1
Views: 41
Reputation: 887088
We can use
tools::file_path_sans_ext(basename(mydf$urls))
#[1] "abdulma02" "abdulta01" "abdursh01" "alexaco01" "alexaco02"
Upvotes: 1