pachadotdev
pachadotdev

Reputation: 3765

Obtain filename from url in R

I have an url like http://example.com/files/01234 that when I click it from the browser downloads a zip file titled like file-08.zip

With wget I can download using the real file name by running

wget --content-disposition http://example.com/files/01234

Functions such as basename do not work in this case, for example:

> basename("http://example.com/files/01234")
[1] "01234"

I'd like to obtain just the filename from the URL in R and create a tibble with zip (files) names. No matter if using packages or system(...) command. Any ideas? what I'd like to obtain is something like

url                            | file
--------------------------------------------
http://example.com/files/01234 | file-08.zip
http://example.com/files/03210 | file-09.zip
...

Upvotes: 2

Views: 822

Answers (2)

MrFlick
MrFlick

Reputation: 206197

Using the httr library, you can make a HEAD call and then parse he content-disposition header For example

library(httr)
hh <- HEAD("https://example.com/01234567")
get_disposition_filename <- function(x) {
  sub(".*filename=", "", headers(x)$`content-disposition`)
}
get_disposition_filename(hh)

This function doesn't check that the header actually exists so it's not very robust, but should work in the case where the server returns an alternate name for the downloaded file.

Upvotes: 4

pachadotdev
pachadotdev

Reputation: 3765

With @Sathish contribution:

When URLs don't contain the file to download in the URL string a valid solution is

system("curl -IXGET -r 0-10 https://example.com/01234567 | grep attachment | sed 's/^.\\+filename=//'")

The idea is to read 10 bytes from the zip instead of the full file before obtaining file name, it will return file-789456.zip or the real zip name from that URL.

Upvotes: 1

Related Questions