Reputation: 25
Stuck on an assignment dealing with URL and XML parsing. I've got the data out but can't seem to get findall() to work. I know that once I can get findall() to work I'll have a list to loop through to. Any insight would be great and hoping to get a gentle nudge versus an outright answer if possible. Thank you!
import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
fhand = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')
raw_data = fhand.read().decode()
xml_data = ET.fromstring(raw_data)
lst = xml_data.findall('name')
print(lst)
Upvotes: 2
Views: 580
Reputation: 979
You could use the requests library and BeautifulSoup for this:
import requests
from bs4 import BeautifulSoup
response = requests.get('http://py4e-data.dr-chuck.net/comments_42.xml')
soup = BeautifulSoup(response.text, 'html.parser')
names = soup.find_all('name')
for name in names:
print(name.text)
Output:
Romina
Laurie
Bayli
Siyona
Taisha
Alanda
Ameelia
Prasheeta
Asif
Risa
Zi
Danyil
Ediomi
Barry
Lance
Hattie
Mathu
Bowie
Samara
Uchenna
Shauni
Georgia
Rivan
Kenan
Hassan
Isma
Samanthalee
Alexa
Caine
Grady
Anne
Rihan
Alexei
Indie
Rhuairidh
Annoushka
Kenzi
Shahd
Irvine
Carys
Skye
Atiya
Rohan
Nuala
Maram
Carlo
Japleen
Breeanna
Zaaine
Inika
Upvotes: 1
Reputation: 81654
findall
is not recursive, meaning it will not find a node/element if it is not directly under the element you called findall
on (if not using xpath, that is).
Instead, use iter
:
import urllib.request
import xml.etree.ElementTree as ET
fhand = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')
raw_data = fhand.read().decode()
xml_data = ET.fromstring(raw_data)
for name_node in xml_data.iter('name'):
print(name_node.text)
or findall
with xpath
:
xml_data.findall('comments/comment/name')
Both will output
Romina
Laurie
Bayli
Siyona
Taisha
Alanda
Ameelia
Prasheeta
Asif
Risa
Zi
Danyil
Ediomi
Barry
Lance
Hattie
Mathu
Bowie
Samara
Uchenna
Shauni
Georgia
Rivan
Kenan
Hassan
Isma
Samanthalee
Alexa
Caine
Grady
Anne
Rihan
Alexei
Indie
Rhuairidh
Annoushka
Kenzi
Shahd
Irvine
Carys
Skye
Atiya
Rohan
Nuala
Maram
Carlo
Japleen
Breeanna
Zaaine
Inika
Upvotes: 1