TomR
TomR

Reputation: 546

BeautifulSoup XML parsing not working

I'm trying to parse an XML page with BeautifulSoup and for some reason it's not able to find the XML parser. I don't think it's a path issue as I've used lxml to parse pages in the past, just not XML. Here's the code:

from bs4 import *
import urllib2
import lxml
from lxml import *


BASE_URL = "http://auctionresults.fcc.gov/Auction_66/Results/xml/round/66_115_database_round.xml"

proxy = urllib2.ProxyHandler({'http':'http://myProxy.com})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
page = urllib2.urlopen(BASE_URL)

soup = BeautifulSoup(page,"xml") 

print soup

I'm probably missing something simple, but all the XML parsing with BS questions I found on here were around bs3 and I'm using bs4 which uses a different method for parsing XML. Thanks.

Upvotes: 1

Views: 1914

Answers (1)

WGS
WGS

Reputation: 14179

If you have lxml installed, just call that as BeautifulSoup's parser instead, like below.

Code:

from bs4 import BeautifulSoup as bsoup
import requests as rq

url = "http://auctionresults.fcc.gov/Auction_66/Results/xml/round/66_115_database_round.xml"
r = rq.get(url)

soup = bsoup(r.content, "lxml")
print soup

Result:

<html><body><dataroot xmlns:od="urn:schemas-microsoft-com:officedata" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:nonamespaceschemalocation="66_database.xsd"><all_bids>
<auction_id>66</auction_id>
<auction_description>Advanced Wireless Services</auction_description>
... really long list follows...
[Finished in 34.9s]

Let us know if this helps.

Upvotes: 1

Related Questions