Reputation: 933
I am trying to retrieve the search results using the following code for a query from pubmed via biopython
from Bio import Entrez
from Bio import Medline
Entrez.email = "[email protected]"
LIM = 3
def search(Term):
handle = Entrez.esearch(db='pmc', term=Term, retmax=100000000)
record = Entrez.read(handle)
idlist = record["IdList"]
handle = Entrez.efetch(db='pmc', id=idlist, rettype="medline", retmode="text")
records = Medline.parse(handle)
return list(records)
mydic=search('(pathological conditions, signs and symptoms[MeSH Terms]) AND (pmc cc license[filter]) ')
print(len(mydic))
No matter how many times I try, I get 10000 in the output. Tried different queries but I still get 10000. When I manually check how many results via browser I get random numbers.
What exactly is going wrong and how to ensure that I get the maximum results?
Upvotes: 1
Views: 541
Reputation: 1614
You only seem to be changing the esearch
limit, but leave efetch
alone (and the NCBI seems to default to a limit of 10000). You need to use the retstart
and retmax
arguments.
See the "Searching for and downloading abstracts using the history" example in the Biopython Tutorial, http://biopython.org/DIST/docs/tutorial/Tutorial.html or http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
Upvotes: 2