Geroge
Geroge

Reputation: 21

Python XMl Parser with BeautifulSoup. How do I remove tags?

For a project I decided to make an app that helps people find friends on Twitter.

I have been able to grab usernames from xml pages. So for example with my current code I can get <uri>http://twitter.com/username</uri> from an XML page, but I want to remove the <uri> and </uri> tags using Beautiful Soup.

Here is my current code:

import urllib
import BeautifulSoup

doc = urllib.urlopen("http://search.twitter.com/search.atom?q=travel").read()

soup = BeautifulStoneSoup(''.join(doc))
data = soup.findAll("uri")

Upvotes: 1

Views: 1104

Answers (2)

johnsyweb
johnsyweb

Reputation: 141898

To answer your question about BeautifulSoup, text is what you need to grab the contents of each <uri> tag. Here I extract the information into a list comprehension:

>>> uris = [uri.text for uri in soup.findAll('uri')]
>>> len(uris)
15
>>> print uris[0]
http://twitter.com/MarieJeppesen

But, as zeekay says, Twitter's REST API is a better approach for querying Twitter.

Upvotes: 0

Zach Kelling
Zach Kelling

Reputation: 53859

Don't use BeautifulSoup to parse twitter, use their API (also don't use BeautifulSoup, use lxml). To answer your question:

import urllib
from BeautifulSoup import BeautifulSoup

resp = urllib.urlopen("http://search.twitter.com/search.atom?q=travel")
soup = BeautifulSoup(resp.read())
for uri in soup.findAll('uri'):
    uri.extract()

Upvotes: 1

Related Questions