user8959427
user8959427

Reputation: 2067

gsub / sub to extract between certain characters

How can I extract the numbers / ID from the following string in R?

link <- "D:/temp/sample_data/0000098618-13-000011.htm"

I want to just extract 0000098618-13-000011

That is discard the .htm and the D:/temp/sample_data/.

I have tried grep and gsub without much luck.

Upvotes: 1

Views: 263

Answers (2)

s_baldur
s_baldur

Reputation: 33613

Using stringr:

library(stringr)
str_extract(link , "[0-9-]+")

# "0000098618-13-000011"

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 270248

1) basename Use basename followed by sub:

sub("\\..*", "", basename(link))
## [1] "0000098618-13-000011"

2) file_path_sans_ext

library(tools)
file_path_sans_ext(link)
## [1] "0000098618-13-000011"

3) sub

sub(".*/(.*)\\..*", "\\1", link)
## [1] "0000098618-13-000011"

4) gsub

gsub(".*/|\\.[^.]*$", "", link)
## [1] "0000098618-13-000011"

5) strsplit

sapply(strsplit(link, "[/.]"), function(x) tail(x, 2)[1])
## [1] "0000098618-13-000011"

6) read.table. If link is a vector this will only work if all elements have the same number of /-separated components. Also this assumes that the only dot is the one separting the extension.

DF <- read.table(text = link, sep = "/", comment = ".", as.is = TRUE)
DF[[ncol(DF)]]
## [1] "0000098618-13-000011"

Upvotes: 3

Related Questions