BeautifulSoup webscraping...getting just the text

Question

I'm trying to extract



    28th & B St Skatepark       #This is what I'm trying to grab, just the text.

With my code

import urllib2
from bs4 import BeautifulSoup

url1 = "http://www.thrashermagazine.com/skateparks/search-results_m94/?cat=61&jr_state=CA&order=alpha&query=all"
content1 = urllib2.urlopen(url1).read()
soup = BeautifulSoup(content1)
print soup.findAll('a')

I get something like this in return.

, , Log in, Register, Home, Store, Thrasher Skateboard Magazine | Videos, Features, Thrasher Skateboard Magazine | Events,

I understand that that's exactly what I'm asking my script to do, but I want to know if there's a way to get just the text that I've indicated rather than everything associated with the tag.

James Mills · Accepted Answer

Use the .text attribute. e.g:

import urllib2
from BeautifulSoup import BeautifulSoup

url1 = "http://www.thrashermagazine.com/skateparks/search-results_m94/?cat=61&jr_state=CA&order=alpha&query=all"
content1 = urllib2.urlopen(url1).read()
soup = BeautifulSoup(content1)
print [e.text for e in soup.findAll('a')]

BeautifulSoup webscraping...getting just the text

Answers (1)

Related Questions