Archit Jain
Archit Jain

Reputation: 2234

BeautifulSoup doesn't reads tags properly

I am tring to parse an xml using BeautifulSOup, but it results in improper output.

file.xml:

<?xml version="1.0" ?> 
<opening name="value1" >
      <element name="value1.1"/>
      <element name="value1.2">
        <element name="1.2.1"/>
      </element>
      <element name="value1.3">
        <element name="value1.3.1"/>
      </element>
</opening>

using following code:

>>> a=open('file.xml').read()
>>> import BeautifulSoup
>>> s= BeautifulSoup.BeautifulSoup(a)
>>> print s.prettify()

and I get following output:

<?xml version='1.0' encoding='utf-8'?>
<opening name="value1">
 <element name="value1.1">
 </element>
 <element name="value1.2">
 </element>
 <element name="1.2.1">
 </element>
 <element name="value1.3">
 </element>
 <element name="value1.3.1">
 </element>
</opening>

Why does is shows all the element as child of opening tag ? How do I parse this file properly?

I've tried using s= BeautifulSoup.BeautifulStoneSoup(a) also but this also didn't work.

Upvotes: 0

Views: 124

Answers (2)

Aaron DeVore
Aaron DeVore

Reputation: 73

Beautiful Soup 3 requires a special argument to get tags to close properly. You need the selfClosingTags argument to the BeautifulStoneSoup constructor. Use something like:

soup = BeautifulStoneSoup(markup, selfClosingTags=['element'])

Upvotes: 0

Jon Clements
Jon Clements

Reputation: 142106

BeautifulSoup is primarily an HTML parser that tries it best to deal with mal-formed HTML. There are XML libraries out there such as lxml which I highly recommend - try that.

An example:

import lxml.etree

xml = """<?xml version="1.0" ?> 
<opening name="value1" >
      <element name="value1.1"/>
      <element name="value1.2">
        <element name="1.2.1"/>
      </element>
      <element name="value1.3">
        <element name="value1.3.1"/>
      </element>
</opening>
"""

r = lxml.etree.fromstring(xml)
r.xpath('//element/@name')
# ['value1.1', 'value1.2', '1.2.1', 'value1.3', 'value1.3.1']

Upvotes: 1

Related Questions