fetching abstracts from pubmed

Question

I have problems to get the abstracts from the following query

Entrez.email = "anonymous@gmail.com"
esearch_query = Entrez.esearch(db="pubmed", term="cancer AND food", retmode="xml")
esearch_result = Entrez.read(esearch_query)

# Now we need to get all papers from our search using the IDList
for iden in esearch_result['IdList'][-1]:
    pubmed_entry = Entrez.efetch(db="pubmed", id=iden, retmode="xml")
    result = Entrez.read(pubmed_entry)
    print result

The output is the following (just for one of the entries as an example).

{u'PubmedArticle': [{u'MedlineCitation': DictElement({u'DateCompleted': {u'Month': '01', u'Day': '10', u'Year': '1976'}, u'OtherID': [], u'DateRevised': {u'Month': '03', u'Day': '22', u'Year': '2017'}, u'MeshHeadingList': [{u'QualifierName': [], u'DescriptorName': StringElement('Binding Sites', attributes={u'UI': u'D001665', u'MajorTopicYN': u'N'})}, {u'QualifierName': [StringElement('blood', attributes={u'UI': u'Q000097', u'MajorTopicYN': u'N'})], u'DescriptorName': StringElement('Cobalt', attributes={u'UI': u'D003035', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Hemoglobins', attributes={u'UI': u'D006454', u'MajorTopicYN': u'Y'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Humans', attributes={u'UI': u'D006801', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Hydrogen-Ion Concentration', attributes={u'UI': u'D006863', u'MajorTopicYN': u'N'})}, {u'QualifierName': [StringElement('blood', attributes={u'UI': u'Q000097', u'MajorTopicYN': u'N'})], u'DescriptorName': StringElement('Iron', attributes={u'UI': u'D007501', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Ligands', attributes={u'UI': u'D008024', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Mathematics', attributes={u'UI': u'D008433', u'MajorTopicYN': u'N'})}, {u'QualifierName': [StringElement('blood', attributes={u'UI': u'Q000097', u'MajorTopicYN': u'Y'})], u'DescriptorName': StringElement('Oxygen', attributes={u'UI': u'D010100', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Oxyhemoglobins', attributes={u'UI': u'D010108', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Protein Binding', attributes={u'UI': u'D011485', u'MajorTopicYN': u'N'})}, {u'QualifierName': [], u'DescriptorName': StringElement('Spectrum Analysis', attributes={u'UI': u'D013057', u'MajorTopicYN': u'N'})}], u'OtherAbstract': [], u'CitationSubset': ['IM'], u'ChemicalList': [{u'NameOfSubstance': StringElement('Hemoglobins', attributes={u'UI': u'D006454'}), u'RegistryNumber': '0'}, {u'NameOfSubstance': StringElement('Ligands', attributes={u'UI': u'D008024'}), u'RegistryNumber': '0'}, {u'NameOfSubstance': StringElement('Oxyhemoglobins', attributes={u'UI': u'D010108'}), u'RegistryNumber': '0'}, {u'NameOfSubstance': StringElement('Cobalt', attributes={u'UI': u'D003035'}), u'RegistryNumber': '3G0H8C9362'}, {u'NameOfSubstance': StringElement('Iron', attributes={u'UI': u'D007501'}), u'RegistryNumber': 'E1UOL152H7'}, {u'NameOfSubstance': StringElement('Oxygen', attributes={u'UI': u'D010100'}), u'RegistryNumber': 'S88TT14065'}], u'KeywordList': [], u'DateCreated': {u'Month': '01', u'Day': '10', u'Year': '1976'}, u'SpaceFlightMission': [], u'GeneralNote': [], u'Article': DictElement({u'ArticleDate': [], u'Pagination': {u'MedlinePgn': '1424-31'}, u'AuthorList': ListElement([DictElement({u'LastName': 'Chow', u'Initials': 'YW', u'Identifier': [], u'AffiliationInfo': [], u'ForeName': 'Y W'}, attributes={u'ValidYN': u'Y'}), DictElement({u'LastName': 'Pietranico', u'Initials': 'R', u'Identifier': [], u'AffiliationInfo': [], u'ForeName': 'R'}, attributes={u'ValidYN': u'Y'}), DictElement({u'LastName': 'Mukerji', u'Initials': 'A', u'Identifier': [], u'AffiliationInfo': [], u'ForeName': 'A'}, attributes={u'ValidYN': u'Y'})], attributes={u'CompleteYN': u'Y'}), u'Language': ['eng'], u'PublicationTypeList': [StringElement('Journal Article', attributes={u'UI': u'D016428'}), StringElement("Research Support, U.S. Gov't, Non-P.H.S.", attributes={u'UI': u'D013486'})], u'Journal': {u'ISSN': StringElement('0006-291X', attributes={u'IssnType': u'Print'}), u'ISOAbbreviation': 'Biochem. Biophys. Res. Commun.', u'JournalIssue': DictElement({u'Volume': '66', u'Issue': '4', u'PubDate': {u'Month': 'Oct', u'Day': '27', u'Year': '1975'}}, attributes={u'CitedMedium': u'Print'}), u'Title': 'Biochemical and biophysical research communications'}, u'ArticleTitle': 'Studies of oxygen binding energy to hemoglobin molecule.', u'ELocationID': []}, attributes={u'PubModel': u'Print'}), u'PMID': StringElement('6', attributes={u'Version': u'1'}), u'MedlineJournalInfo': {u'MedlineTA': 'Biochem Biophys Res Commun', u'Country': 'United States', u'NlmUniqueID': '0372516', u'ISSNLinking': '0006-291X'}}, attributes={u'Status': u'MEDLINE', u'Owner': u'NLM'}), u'PubmedData': {u'ArticleIdList': [StringElement('6', attributes={u'IdType': u'pubmed'}), StringElement('0006-291X(75)90518-5', attributes={u'IdType': u'pii'})], u'PublicationStatus': 'ppublish', u'History': [DictElement({u'Month': '10', u'Day': '27', u'Year': '1975'}, attributes={u'PubStatus': u'pubmed'}), DictElement({u'Minute': '1', u'Month': '10', u'Day': '27', u'Hour': '0', u'Year': '1975'}, attributes={u'PubStatus': u'medline'}), DictElement({u'Minute': '0', u'Month': '10', u'Day': '27', u'Hour': '0', u'Year': '1975'}, attributes={u'PubStatus': u'entrez'})]}}], u'PubmedBookArticle': []}

How can i get the abstract ?? The final idea is to have some of the fields (such as title , abstract..) in a sql database.

Thanks, david

cdlane · Accepted Answer

What may be working against you is that typically there are no abstracts for MEDLINE PubMed records from before 1975 -- your example is right on the cusp in 1975. I worked with your code and a different query that turned up two article ids, one with an abstract and one without:

from Bio import Entrez

Entrez.email = "anonymous@gmail.com"

esearch_query = Entrez.esearch(db="pubmed", term="cancer AND wombats", retmode="xml")
esearch_result = Entrez.read(esearch_query)

for identifier in esearch_result['IdList']:
    pubmed_entry = Entrez.efetch(db="pubmed", id=identifier, retmode="xml")
    result = Entrez.read(pubmed_entry)

    article = result['PubmedArticle'][0]['MedlineCitation']['Article']

    if 'Abstract' in article:
        print(article['Abstract']['AbstractText'])

TRUNCATED OUTPUT

['This report catalogues all spontaneous proliferations in macropods, koalas, wombats, and possums and gliders held by the Comparative Pathology Registry at Taronga Zoo. Proliferative lesions were present in 14 macropods, 26 koalas, two wombats and 22 possums and gliders. Most neoplasms recorded in macropods were singular and ....']

Details can be found in the document: MEDLINE PubMed XML Element Descriptions

fetching abstracts from pubmed

Answers (1)

Related Questions