Reputation: 2067
How can I extract the numbers / ID from the following string in R?
link <- "D:/temp/sample_data/0000098618-13-000011.htm"
I want to just extract 0000098618-13-000011
That is discard the .htm
and the D:/temp/sample_data/
.
I have tried grep and gsub without much luck.
Upvotes: 1
Views: 263
Reputation: 33613
Using stringr
:
library(stringr)
str_extract(link , "[0-9-]+")
# "0000098618-13-000011"
Upvotes: 0
Reputation: 270248
1) basename Use basename
followed by sub
:
sub("\\..*", "", basename(link))
## [1] "0000098618-13-000011"
2) file_path_sans_ext
library(tools)
file_path_sans_ext(link)
## [1] "0000098618-13-000011"
3) sub
sub(".*/(.*)\\..*", "\\1", link)
## [1] "0000098618-13-000011"
4) gsub
gsub(".*/|\\.[^.]*$", "", link)
## [1] "0000098618-13-000011"
5) strsplit
sapply(strsplit(link, "[/.]"), function(x) tail(x, 2)[1])
## [1] "0000098618-13-000011"
6) read.table. If link
is a vector this will only work if all elements have the same number of /-separated components. Also this assumes that the only dot is the one separting the extension.
DF <- read.table(text = link, sep = "/", comment = ".", as.is = TRUE)
DF[[ncol(DF)]]
## [1] "0000098618-13-000011"
Upvotes: 3