Reputation: 3765
I have an url like http://example.com/files/01234
that when I click it from the browser downloads a zip file titled like file-08.zip
With wget I can download using the real file name by running
wget --content-disposition http://example.com/files/01234
Functions such as basename
do not work in this case, for example:
> basename("http://example.com/files/01234")
[1] "01234"
I'd like to obtain just the filename from the URL in R and create a tibble with zip (files) names. No matter if using packages or system(...)
command. Any ideas? what I'd like to obtain is something like
url | file
--------------------------------------------
http://example.com/files/01234 | file-08.zip
http://example.com/files/03210 | file-09.zip
...
Upvotes: 2
Views: 822
Reputation: 206197
Using the httr
library, you can make a HEAD
call and then parse he content-disposition
header For example
library(httr)
hh <- HEAD("https://example.com/01234567")
get_disposition_filename <- function(x) {
sub(".*filename=", "", headers(x)$`content-disposition`)
}
get_disposition_filename(hh)
This function doesn't check that the header actually exists so it's not very robust, but should work in the case where the server returns an alternate name for the downloaded file.
Upvotes: 4
Reputation: 3765
With @Sathish contribution:
When URLs don't contain the file to download in the URL string a valid solution is
system("curl -IXGET -r 0-10 https://example.com/01234567 | grep attachment | sed 's/^.\\+filename=//'")
The idea is to read 10 bytes from the zip instead of the full file before obtaining file name, it will return file-789456.zip
or the real zip name from that URL.
Upvotes: 1