Canovice
Canovice

Reputation: 10163

Extract part of URL in R

dput(mydf)
structure(list(urls = c("/players/a/abdulma02.html", 
"/players/a/abdulta01.html", 
"/players/a/abdursh01.html", "/players/a/alexaco01.html", "/players/a/alexaco02.html"
), names = c("Mahmoud Abdul-Rauf", "Tariq Abdul-Wahad", "Shareef Abdur-Rahim", 
"Cory Alexander", "Courtney Alexander")), row.names = c(NA, 5L
), class = "data.frame")

head(mydf)
                       urls               names
1 /players/a/abdulma02.html  Mahmoud Abdul-Rauf
2 /players/a/abdulta01.html   Tariq Abdul-Wahad
3 /players/a/abdursh01.html Shareef Abdur-Rahim
4 /players/a/alexaco01.html      Cory Alexander
5 /players/a/alexaco02.html  Courtney Alexander

My problem is simple - I would like to extract the part of the url before the html (abdulma02, abdulta01, etc.). The data is formatted such that the ending will always be .html, and the start will always be /players/{single letter}/{what i want}.html

I have taken a stab at this using the new urltools library to no success (tried their urltools::suffix_extract() function). Any help here is appreciated.

Upvotes: 1

Views: 41

Answers (1)

akrun
akrun

Reputation: 887088

We can use

tools::file_path_sans_ext(basename(mydf$urls))
#[1] "abdulma02" "abdulta01" "abdursh01" "alexaco01" "alexaco02"

Upvotes: 1

Related Questions