radar
radar

Reputation: 510

issues with python xml parsing

I'm new to xml and REST but have some basic knowledge with python. I'm facing some issues while trying to parse the attached xml file.

I use Beautifulsoup library to parse the file and, for an unknown reason, I can access different fields of entries 2 and 3 but not entry 1, while they are all formatted the same way. Can someone tell what I'm doing wrong with my (attached) code and output please?

<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title type="text">News</title>
    <id>1</id>
    <link href="" />
    <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/entries" rel="self" />
    <updated>2014-11-26T10:41:12.424Z</updated>
    <author />
    <entry xmlns:georss="http://www.georss.org/georss">
        <title type="html">TEST REST</title>
        <content type="html">1</content>
        <author>
            <name>User213</name>
        </author>
        <summary type="html">Test PUT Entry 3</summary>
        <id>7</id>
        <georss:point>21.94420760726878 17.44</georss:point>
        <updated>2014-11-24T09:55:31.000Z</updated>
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/7" rel="self" type="application/atom+xml" length="0" />
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/7/editEntry" rel="edit" type="application/atom+xml" length="0" />
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/7/comments" rel="replies" type="application/atom+xml" length="0" />
    </entry>
    <entry xmlns:georss="http://www.georss.org/georss">
        <title type="html">TEST REST</title>
        <content type="html">1</content>
        <author>
            <name>User213</name>
        </author>
        <summary type="html">Test PUT Entry 8</summary>
        <id>8</id>
        <georss:point>21.94420760726878 17.44</georss:point>
        <updated>2014-11-24T13:47:09.000Z</updated>
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/8" rel="self" type="application/atom+xml" length="0" />
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/8/editEntry" rel="edit" type="application/atom+xml" length="0" />
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/8/comments" rel="replies" type="application/atom+xml" length="0" />
    </entry>
    <entry xmlns:georss="http://www.georss.org/georss">
        <title type="html">TEST REST</title>
        <content type="html">1</content>
        <author>
            <name>User213</name>
        </author>
        <summary type="html">Test POST</summary>
        <id>12</id>
        <georss:point>21.94420760726878 17.44</georss:point>
        <updated>2014-11-25T14:29:02.000Z</updated>
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/12" rel="self" type="application/atom+xml" length="0" />
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/12/editEntry" rel="edit" type="application/atom+xml" length="0" />
        <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/12/comments" rel="replies" type="application/atom+xml" length="0" />
    </entry>
</feed>

Python code:

#!/usr/bin/python
from BeautifulSoup import BeautifulSoup
handler = open("/tmp/test.xml").read()

results = soup.findAll('entry')
for r in results:
    print r
    print r.find('title').text
    print r.find('content').text
    print r.find('georss:point')
    print r.find('id')
    print r.find('updated')

And the output is the following:

<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
</entry>
TEST REST
1
None
None
None
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
<author>
<name>User213</name>
</author>
<summary type="html">Test PUT Entry 8</summary>
<id>8</id>
<georss:point>21.94420760726878 17.44</georss:point>
<updated>2014-11-24T13:47:09.000Z</updated>
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/8" rel="self" type="application/atom+xml" length="0" />
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/8/editEntry" rel="edit" type="application/atom+xml" length="0" />
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/8/comments" rel="replies" type="application/atom+xml" length="0" />
</entry>
TEST REST
1
<georss:point>21.94420760726878 17.44</georss:point>
<id>8</id>
<updated>2014-11-24T13:47:09.000Z</updated>
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
<author>
<name>User213</name>
</author>
<summary type="html">Test POST</summary>
<id>12</id>
<georss:point>21.94420760726878 17.44</georss:point>
<updated>2014-11-25T14:29:02.000Z</updated>
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/12" rel="self" type="application/atom+xml" length="0" />
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/12/editEntry" rel="edit" type="application/atom+xml" length="0" />
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/12/comments" rel="replies" type="application/atom+xml" length="0" />
</entry>
TEST REST
1
<georss:point>21.94420760726878 17.44</georss:point>
<id>12</id>
<updated>2014-11-25T14:29:02.000Z</updated>

Upvotes: 0

Views: 89

Answers (1)

Fumbo
Fumbo

Reputation: 110

From what I have tested with the following code :

#!/usr/bin/python
from BeautifulSoup import BeautifulSoup
handler = open("./test.xml").read()

soup = BeautifulSoup(handler)
print soup.prettify()

The ouput is like that :

<?xml version='1.0' encoding='utf-8'?>
<feed xmlns="http://www.w3.org/2005/Atom">
 <title type="text">
  News
 </title>
 <id>
  1
 </id>
 <link href="" />
 <link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/entries" rel="self" />
 <updated>
  2014-11-26T10:41:12.424Z
 </updated>
 <author>
  <entry xmlns:georss="http://www.georss.org/georss">
   <title type="html">
    TEST REST
   </title>
   <content type="html">
    1
   </content>
  </entry>
 </author>
 <author>
  <name>
   User213
  </name>
 </author>

If you look closely you will see that in your xml the <author /> is seen as an open tag by BeautifulSoup.

That's why you he don't find title, content.. because for him they are out of the tag.

Hope this`ll help

Upvotes: 1

Related Questions