Reputation: 21
For a project I decided to make an app that helps people find friends on Twitter.
I have been able to grab usernames from xml pages. So for example with my current code I can get <uri>http://twitter.com/username</uri>
from an XML page, but I want to remove the <uri>
and </uri>
tags using Beautiful Soup.
Here is my current code:
import urllib
import BeautifulSoup
doc = urllib.urlopen("http://search.twitter.com/search.atom?q=travel").read()
soup = BeautifulStoneSoup(''.join(doc))
data = soup.findAll("uri")
Upvotes: 1
Views: 1104
Reputation: 141898
To answer your question about BeautifulSoup, text
is what you need to grab the contents of each <uri>
tag. Here I extract the information into a list comprehension:
>>> uris = [uri.text for uri in soup.findAll('uri')]
>>> len(uris)
15
>>> print uris[0]
http://twitter.com/MarieJeppesen
But, as zeekay says, Twitter's REST API is a better approach for querying Twitter.
Upvotes: 0
Reputation: 53859
Don't use BeautifulSoup to parse twitter, use their API (also don't use BeautifulSoup, use lxml). To answer your question:
import urllib
from BeautifulSoup import BeautifulSoup
resp = urllib.urlopen("http://search.twitter.com/search.atom?q=travel")
soup = BeautifulSoup(resp.read())
for uri in soup.findAll('uri'):
uri.extract()
Upvotes: 1