a83
a83

Reputation: 21

Can R download specific files from a list of webpages?

I have a files would like to download from a webpage, e.g.

http://www.pdb.org/pdb/explore/explore.do?structureId=2FBA http://www.pdb.org/pdb/explore/explore.do?structureId=2GVS

I have a list contains 2FBA, 2GVS, etc....

By using RCurl and XML, I know R can help to scrap information from a website to a dataframe. Can I use R to download all the 2FBA.pdb, 2GVS.pdb, etc files from the webpages, while using R to instruct how to replace the last 4 letters (2FBA to 2GVS...) and download all these files to my working computer?

It seems to me that it can be done by python (previous reply from the stackoverflow). However, I am not very familiar with python. That's why I am asking if R can do similar things for me in a smart way. Thanks for comments.

Upvotes: 2

Views: 1877

Answers (2)

Ramnath
Ramnath

Reputation: 55735

Here is one approach using the plyr package. The idea is to construct a function that downloads the pdb file for a given protein, and then using l_ply in plyr to loop through a list of proteins.

# function to download pdb file for a given protein
download_pdb = function(protein){

    base_url  = "http://www.pdb.org/pdb/files/"
    dest_file = paste(protein, '.pdb.gz', sep = "")    
    protein_url = paste(base_url, dest_file, sep = "")
    download.file(protein_url, destfile = dest_file)

}

proteins  = list('2FBA', '2GVS')
require(plyr)
l_ply(proteins, download_pdb)

Upvotes: 4

Roman Luštrik
Roman Luštrik

Reputation: 70653

I would first paste together the desired url and use download.file from the utils package. Something along the lines of

my.url <- "www.somewhere.com"
my.files <- c("file1.xxx", "file2.xxx", "file3.xxx")

my.list <- as.list(paste(my.url, my.files, sep = "/"))

my.dl <- lapply(X = my.list, FUN = download.file)

Upvotes: 3

Related Questions