RRg
RRg

Reputation: 123

Extracting Author name from XML tags using ElelemtTree

Following is the link to access the XML document:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=%2726161999%27&retmode=xml

I'm trying to extract the author Name which includes Lastname+Forename and make a string with only author name. I'm only being able to extract the details separately.

Following is the code that I have tried

     r = requests.get(
                'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id='26161999'&retmode=xml')
    root = et.fromstring(r.content)
    for elem in root.findall(".//ForeName"):
        elem_ = elem.text
        auth_name = list(elem_.split(" "))
        authordata.append(auth_name)
    val = [item if isinstance(item, str) else " ".join(item) for item in authordata]         #flattening the list since its a nested list, converting nested list into string
    seen = set()
    val = [x for x in val if x not in seen and not seen.add(x)]
    author= ' '.join(val)
    print(author)

The output obtained from the above code is:

Elisa Riccardo Mirco Laura Valentina Antonio Sara Carla Borri Barbara

The expected output is a combination of firstname + Lastname:

Elisa Oppici Riccardo Montioli Mirco Dindo Laura Maccari Valentina Porcari Antonio Lorenzetto Chellini Sara Carla Borri Voltattorni Barbara Cellini

Upvotes: 0

Views: 216

Answers (1)

Tankred
Tankred

Reputation: 316

From your question I understand that you want a concatenation of ForeName and LastName for each author. You can achieve that by querying directly for those fields for each Author element in the tree and concatenate the corresponding text fields:

import xml.etree.ElementTree as et
import requests

r = requests.get(
     'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id="26161999"&retmode=xml'
)
root = et.fromstring(r.content)

author_names = []
for author in root.findall(".//Author"):
    fore_name = author.find('ForeName').text
    last_name = author.find('LastName').text
    author_names.append(fore_name + ' ' + last_name)

print(author_names)

# or to get your exact output format:
print(' '.join(author_names))

Upvotes: 1

Related Questions