1pa
1pa

Reputation: 735

Python-Beautiful Soup "find_all" only return one result

I have a file like this

<a>
    <b>1</b>
</a>
<a>
    <b>2</b>
</a>
<a>
    <b>3</b>
</a>

and i want all the information inside so i wrote this code:

from bs4 import BeautifulSoup
infile = open("testA.xml",'r')
contents = infile.read()
soup=BeautifulSoup(contents,'xml')
result = soup.find_all('a')
print(result)

The output:

[<a>
<b>1</b>
</a>]

I don't understand why i can retrieve all the info from the file. I want something like this:

[<a>
<b>1</b>
</a>, 
<a>
<b>2</b>
</a>, 
<a>
<b>3</b>
</a>]

Thank you all

Upvotes: 1

Views: 1204

Answers (2)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476534

In case your file is truly an XML file, it should contain an XML header.

If it is not, you can use lxml as parser:

from bs4 import BeautifulSoup
infile = open("testA.xml",'r')
contents = infile.read()
soup=BeautifulSoup(contents,'lxml')
result = soup.find_all('a')
print(result)

Mind that you better use a context (with) when you read from files, so you can make it more elegant with:

from bs4 import BeautifulSoup
with open("testA.xml",'r') as infile:
    contents = infile.read()
    soup=BeautifulSoup(contents,'lxml')
    result = soup.find_all('a')
    print(result)

This will enforce Python to close the file once jumping out of the with scope.

Running this in Python3 gives:

$ python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from bs4 import BeautifulSoup
>>> infile = open("testA.xml",'r')
>>> contents = infile.read()
>>> soup=BeautifulSoup(contents,'lxml')
>>> result = soup.find_all('a')
>>> result
[<a>
<b>1</b>
</a>, <a>
<b>2</b>
</a>, <a>
<b>3</b>
</a>]

Upvotes: 2

Harry
Harry

Reputation: 318

the main problem is that you don't have a root tag. change your xml file to

`<?xml version="1.0" encoding="utf-8"?>
<content>
    <a>
        <b>1</b>
    </a>
    <a>
        <b>2</b>
    </a>
    <a>
        <b>3</b>
    </a>
</content>`

you can change content accordingly.

Upvotes: 1

Related Questions