Reputation: 937
Oh, man. I am so terrible at removing for-loops from my code because I find them so intuitive and I first learned C++. Below, I am fetching IDs for a search (copd in this case) and using that ID to retrieve its full XML file and from that save its location into a vector. I do not know how to speed this up, and it took about 5 minutes to run on 700 IDs, whereas most searches have 70,000+ IDs. Thank you for any and all guidance.
library(rentrez)
library(XML)
# number of articles for term copd
count <- entrez_search(db = "pubmed", term = "copd")$count
# set max to count
id <- entrez_search(db = "pubmed", term = "copd", retmax = count)$ids
# empty vector that will soon contain locations
location <- character()
# get all location data
for (i in 1:count)
{
# get ID of each search
test <- entrez_fetch(db = "pubmed", id = id[i], rettype = "XML")
# convert to XML
test_list <- XML::xmlToList(test)
# retrieve location
location <- c(location, test_list$PubmedArticle$MedlineCitation$Article$AuthorList$Author$AffiliationInfo$Affiliation)
}
Upvotes: 0
Views: 142
Reputation: 4671
This may give you a start - it seems to be possible to pull down multiple at once.
library(rentrez)
library(xml2)
# number of articles for term copd
count <- entrez_search(db = "pubmed", term = "copd")$count
# set max to count
id_search <- entrez_search(db = "pubmed", term = "copd", retmax = count, use_history = T)
# get all
document <- entrez_fetch(db = "pubmed", rettype = "XML", web_history = id_search$web_history)
document_list <- as_list(read_xml(document))
Problem is that this is still time consuming because there are a large number of documents. Its also curious that it returns exactly 10,000 articles when I've tried this - there may be a limit to what you can return at once.
You can then use something like the purrr
package to start extracting the information you want.
Upvotes: 2