Parsing xml file using Python3 and BeautifulSoup

Question

I know there are several answers to questions regarding xml parsing with Python 3, but I can't find the answer to two that I have. I am trying to parse and extract information from a BoardGameGeek xml file that looks like the following (it's too long for me to paste in here):

https://www.boardgamegeek.com/xmlapi/boardgame/10

1) I am having trouble extracting the primary game name from these two lines:

Elfenland
Elfenland (Волшебное Путешествие)

2) I am also having trouble extracting lists of data, such as in this xml:

Currently, my code is very simple, and looks like this. It only extracts simple one value xml lines. Any help on how to extract the more complex information would be great. Thank you.

url = 'https://www.boardgamegeek.com/xmlapi/boardgame/10'
response = urllib.request.urlopen(url)
data = response.read()      # a `bytes` object
text = data.decode('utf-8') # a `str`; 
soup = BeautifulSoup(text,'xml')
yearpublished = soup.find_all('yearpublished')

Dan-Dev · Accepted Answer

For the first part try searching for the element "name" where the attribute "primary" is present like this:

from bs4 import BeautifulSoup
import urllib

url = 'https://www.boardgamegeek.com/xmlapi/boardgame/10'
response = urllib.request.urlopen(url)
data = response.read()      # a `bytes` object
text = data.decode('utf-8') # a `str`;
soup = BeautifulSoup(text,'xml')
name = soup.find('name', primary = True)

print (name.get_text())

Outputs:

Elfenland

For the second loop over the "results" elements and extract the data you want:

text = """

    
        
...
        
    

"""
soup = BeautifulSoup(text,'xml')

for result in soup.find_all('results'):
    numplayers = result['numplayers']
    best = result.find('result', {'value': 'Best'})['numvotes']
    recommended = result.find('result', {'value': 'Recommended'})['numvotes']
    not_recommended = result.find('result', {'value': 'Not Recommended'})['numvotes']
    print (numplayers, best, recommended, not_recommended)

Outputs:

Or if you want to do it more elegantly find all of each attribute and zip them:

soup = BeautifulSoup(text,'xml')
numplayers = [tag['numplayers'] for tag in soup.find_all('results')]
best = [tag['numvotes'] for tag in soup.find_all('result', {'value': 'Best'})]
recommended = [tag['numvotes'] for tag in soup.find_all('result', {'value': 'Recommended'})]
not_recommended = [tag['numvotes'] for tag in soup.find_all('result', {'value': 'Not Recommended'})]
print(list(zip(numplayers, best, recommended, not_recommended)))

Outputs:

[('1', '0', '0', '58'), ('2', '2', '21', '53'), ('3', '10', '46', '17'), ('4', '47', '36', '1'), ('5', '35', '44', '2'), ('6', '23', '48', '11'), ('6+', '0', '1', '46')]

Parsing xml file using Python3 and BeautifulSoup

Answers (1)

Related Questions