Tihamer
Tihamer

Reputation: 73

Download a list of all pubmed ids by date (from-to)

I need to automate PubMed article harvesting. I found only examples of downloading PubMed articles by term query and downloading a PubMed article by pmid. (ONE ARTICLE) But what I'm thinking about is to download a LIST of PubMed IDs by date (from-to), or all of them, like in the OAI.

Upvotes: 2

Views: 1608

Answers (1)

Maximilian Peters
Maximilian Peters

Reputation: 31709

You can use BioPython for such purposes. The following code snippet will give you a link for all PubMed articles in a certain date range. PMC articles can be downloaded directly, for other articles the DOI is provided, but the location of the PDF is publisher specific and cannot be predicted for all articles.

def article_links(start_date, end_date = '3000'):
    """
    start_date, end_date = 'YYYY/MM/DD'
    returns a list of PubMedCentral links and a 2nd list of DOI links
    """
    from Bio import Entrez

    Entrez.email = "[email protected]"

    #get all articles in certain date range, in this case 5 articles which will be published in the future
    handle = Entrez.esearch(db="pubmed", term='("%s"[Date - Publication] : "%s"[Date - Publication]) ' %(start_date, end_date))
    records = Entrez.read(handle)

    #get a list of Pubmed IDs for all articles
    idlist = ','.join(records['IdList'])
    handle = Entrez.efetch("pubmed", id=idlist, retmode="xml")
    records = Entrez.parse(handle)

    pmc_articles = []
    doi = []
    for record in records:
        #get all PMC articles
        if record.get('MedlineCitation'):
            if record['MedlineCitation'].get('OtherID'):
               for other_id in record['MedlineCitation']['OtherID']:
                   if other_id.title().startswith('Pmc'):
                       pmc_articles.append('http://www.ncbi.nlm.nih.gov/pmc/articles/%s/pdf/' % (other_id.title().upper()))
        #get all DOIs
        if record.get('PubmedData'):
            if record['PubmedData'].get('ArticleIdList'):
                for other_id in record['PubmedData']['ArticleIdList']:
                    if 'doi' in other_id.attributes.values():
                        doi.append('http://dx.doi.org/' + other_id.title())


    return pmc_articles, doi

if __name__ == '__main__':
    print (article_links('2016/12/20'))

Upvotes: 3

Related Questions