Toby
Toby

Reputation: 25

Searching PubMed using BioPython and writing to CSV

I am using BioPython to fill a CSV file of data about citations from their PubMed title. I have written this so far:

import csv
from Bio import Entrez
import bs4

Entrez.email = "my_email"
CSVfile = open('srData.csv')
fileReader = csv.reader(CSVfile)
Data = list(fileReader)

with open('blank.csv','w') as f1:
  writer=csv.writer(f1, delimiter='\t',lineterminator='\n',)
  for id in Data:
    handle = Entrez.efetch(db="pubmed", id=id, rettype="gb", retmode="xml")
    record = Entrez.read(handle)
    title=record[0]['MedlineCitation']['Article']['ArticleTitle']
    abstract=record[0]['MedlineCitation']['Article']['Abstract']
    mesh =record[0]['MedlineCitation']['MeshHeadingList']
    descriptors = ','.join(term['DescriptorName'] for term in mesh)
    writer.writerow([title, abstract, descriptors])

However, this produces an unusual output where the title, abstract and MeSH terms are spread across multiple columns and not separated which I presume is due to their type. (). I wish my csv table to be made of three columns, one containing title, the other the abstract and the other the mesh terms.

How can I accomplish this?

Sample Output

To clarify, the first column contains the entire title, and the beginning of the abstract and the next few columns contain subsequent portions of the abstract. I require them split, into distinct columns. ie. The first column should only contain the title. The seccond only the abstract, the third only MeSH terms.

Currently, the first column contains:

"Distinct and combined vascular effects of ACE blockade and HMG-CoA reductase inhibition in hypertensive subjects.  {u'AbstractText': ['Hypercholesterolemia and hypertension are frequently associated with elevated sympathetic activity. Both are independent cardiovascular risk factors and both affect endothelium-mediated vasodilation. To identify the effects of cholesterol-lowering and antihypertensive treatments on vascular reactivity and vasodilative capacity"

Upvotes: 1

Views: 1816

Answers (1)

larsks
larsks

Reputation: 312440

The value of record[0]['MedlineCitation']['Article']['Abstract'] is a dictionary that contains the abstract text and a shorter summary. If you want the actual abstract, instead of:

abstract=record[0]['MedlineCitation']['Article']['Abstract']

You need:

abstract=record[0]['MedlineCitation']['Article']['Abstract']['AbstractText'][0]

Now abstract contains a single string and should be suitable for writing to your CSV file.

Update

I'm unable to reproduce the error you've described in your comment, even when using the same input data:

>>> from Bio import Entrez
>>> Entrez.email = '...'
>>> id=10067800
>>> handle = Entrez.efetch(db="pubmed", id=id, rettype="gb", retmode="xml")
>>> record = Entrez.read(handle)
>>> abstract=record[0]['MedlineCitation']['Article']['Abstract']['AbstractText'][0]
>>> abstract
StringElement('To assess the antihypertensive efficacy and safety of the novel AT1 receptor antagonist, telmisartan, compared with that of enalapril in elderly patients with mild to moderate hypertension.', attributes={u'NlmCategory': u'OBJECTIVE', u'Label': u'OBJECTIVE'})
>>> 

Upvotes: 1

Related Questions