Reputation: 735
I have a file like this
<a>
<b>1</b>
</a>
<a>
<b>2</b>
</a>
<a>
<b>3</b>
</a>
and i want all the information inside so i wrote this code:
from bs4 import BeautifulSoup
infile = open("testA.xml",'r')
contents = infile.read()
soup=BeautifulSoup(contents,'xml')
result = soup.find_all('a')
print(result)
The output:
[<a>
<b>1</b>
</a>]
I don't understand why i can retrieve all the info from the file. I want something like this:
[<a>
<b>1</b>
</a>,
<a>
<b>2</b>
</a>,
<a>
<b>3</b>
</a>]
Thank you all
Upvotes: 1
Views: 1204
Reputation: 476534
In case your file is truly an XML file, it should contain an XML header.
If it is not, you can use lxml
as parser:
from bs4 import BeautifulSoup
infile = open("testA.xml",'r')
contents = infile.read()
soup=BeautifulSoup(contents,'lxml')
result = soup.find_all('a')
print(result)
Mind that you better use a context (with
) when you read from files, so you can make it more elegant with:
from bs4 import BeautifulSoup
with open("testA.xml",'r') as infile:
contents = infile.read()
soup=BeautifulSoup(contents,'lxml')
result = soup.find_all('a')
print(result)
This will enforce Python to close the file once jumping out of the with
scope.
Running this in Python3 gives:
$ python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from bs4 import BeautifulSoup
>>> infile = open("testA.xml",'r')
>>> contents = infile.read()
>>> soup=BeautifulSoup(contents,'lxml')
>>> result = soup.find_all('a')
>>> result
[<a>
<b>1</b>
</a>, <a>
<b>2</b>
</a>, <a>
<b>3</b>
</a>]
Upvotes: 2
Reputation: 318
the main problem is that you don't have a root tag. change your xml file to
`<?xml version="1.0" encoding="utf-8"?>
<content>
<a>
<b>1</b>
</a>
<a>
<b>2</b>
</a>
<a>
<b>3</b>
</a>
</content>`
you can change content accordingly.
Upvotes: 1