Reputation: 510
I'm new to xml and REST but have some basic knowledge with python. I'm facing some issues while trying to parse the attached xml file.
I use Beautifulsoup library to parse the file and, for an unknown reason, I can access different fields of entries 2 and 3 but not entry 1, while they are all formatted the same way. Can someone tell what I'm doing wrong with my (attached) code and output please?
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">News</title>
<id>1</id>
<link href="" />
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/entries" rel="self" />
<updated>2014-11-26T10:41:12.424Z</updated>
<author />
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
<author>
<name>User213</name>
</author>
<summary type="html">Test PUT Entry 3</summary>
<id>7</id>
<georss:point>21.94420760726878 17.44</georss:point>
<updated>2014-11-24T09:55:31.000Z</updated>
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/7" rel="self" type="application/atom+xml" length="0" />
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/7/editEntry" rel="edit" type="application/atom+xml" length="0" />
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/7/comments" rel="replies" type="application/atom+xml" length="0" />
</entry>
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
<author>
<name>User213</name>
</author>
<summary type="html">Test PUT Entry 8</summary>
<id>8</id>
<georss:point>21.94420760726878 17.44</georss:point>
<updated>2014-11-24T13:47:09.000Z</updated>
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/8" rel="self" type="application/atom+xml" length="0" />
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/8/editEntry" rel="edit" type="application/atom+xml" length="0" />
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/8/comments" rel="replies" type="application/atom+xml" length="0" />
</entry>
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
<author>
<name>User213</name>
</author>
<summary type="html">Test POST</summary>
<id>12</id>
<georss:point>21.94420760726878 17.44</georss:point>
<updated>2014-11-25T14:29:02.000Z</updated>
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/12" rel="self" type="application/atom+xml" length="0" />
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/12/editEntry" rel="edit" type="application/atom+xml" length="0" />
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/12/comments" rel="replies" type="application/atom+xml" length="0" />
</entry>
</feed>
Python code:
#!/usr/bin/python
from BeautifulSoup import BeautifulSoup
handler = open("/tmp/test.xml").read()
results = soup.findAll('entry')
for r in results:
print r
print r.find('title').text
print r.find('content').text
print r.find('georss:point')
print r.find('id')
print r.find('updated')
And the output is the following:
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
</entry>
TEST REST
1
None
None
None
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
<author>
<name>User213</name>
</author>
<summary type="html">Test PUT Entry 8</summary>
<id>8</id>
<georss:point>21.94420760726878 17.44</georss:point>
<updated>2014-11-24T13:47:09.000Z</updated>
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/8" rel="self" type="application/atom+xml" length="0" />
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/8/editEntry" rel="edit" type="application/atom+xml" length="0" />
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/8/comments" rel="replies" type="application/atom+xml" length="0" />
</entry>
TEST REST
1
<georss:point>21.94420760726878 17.44</georss:point>
<id>8</id>
<updated>2014-11-24T13:47:09.000Z</updated>
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
<author>
<name>User213</name>
</author>
<summary type="html">Test POST</summary>
<id>12</id>
<georss:point>21.94420760726878 17.44</georss:point>
<updated>2014-11-25T14:29:02.000Z</updated>
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/12" rel="self" type="application/atom+xml" length="0" />
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/12/editEntry" rel="edit" type="application/atom+xml" length="0" />
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/12/comments" rel="replies" type="application/atom+xml" length="0" />
</entry>
TEST REST
1
<georss:point>21.94420760726878 17.44</georss:point>
<id>12</id>
<updated>2014-11-25T14:29:02.000Z</updated>
Upvotes: 0
Views: 89
Reputation: 110
From what I have tested with the following code :
#!/usr/bin/python
from BeautifulSoup import BeautifulSoup
handler = open("./test.xml").read()
soup = BeautifulSoup(handler)
print soup.prettify()
The ouput is like that :
<?xml version='1.0' encoding='utf-8'?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">
News
</title>
<id>
1
</id>
<link href="" />
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/entries" rel="self" />
<updated>
2014-11-26T10:41:12.424Z
</updated>
<author>
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">
TEST REST
</title>
<content type="html">
1
</content>
</entry>
</author>
<author>
<name>
User213
</name>
</author>
If you look closely you will see that in your xml the <author />
is seen as an open tag by BeautifulSoup.
That's why you he don't find title, content.. because for him they are out of the tag.
Hope this`ll help
Upvotes: 1