What is this function doing in Python involving urllib2 and BeautifulSoup?

Question

So I asked a question earlier about retrieving high scores form an html page and another user gave me the following code to help. I am new to python and beautifulsoup so I'm trying to go through some other codes piece by piece. I understand most of it but I dont get what this piece of code is and what its function is:

    def parse_string(el):
       text = ''.join(el.findAll(text=True))
       return text.strip()

Here is the entire code:

from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
import sys

URL = "http://hiscore.runescape.com/hiscorepersonal.ws?user1=" + sys.argv[1]

# Grab page html, create BeatifulSoup object
html = urlopen(URL).read()
soup = BeautifulSoup(html)

# Grab the  element
scores = soup.find('table', {'id':'mini_player'})

# Get a list of all the s in the table, skip the header row
rows = scores.findAll('tr')[1:]

# Helper function to return concatenation of all character data in an element
def parse_string(el):
   text = ''.join(el.findAll(text=True))
   return text.strip()

for row in rows:

   # Get all the text from the s
   data = map(parse_string, row.findAll('td'))

   # Skip the first td, which is an image
   data = data[1:]

   # Do something with the data...
   print data

Eli Courtwright · Accepted Answer

el.findAll(text=True) returns all the text contained within an element and its sub-elements. By text I mean everything not inside a tag; so in hello then "hello" would be the text but and would not.

That function therefore joins together all text found beneath the given element and strips whitespace off from the front and back.

Here's a link to the findAll documentation: http://www.crummy.com/software/BeautifulSoup/documentation.html#arg-text

What is this function doing in Python involving urllib2 and BeautifulSoup?

Answers (1)

Related Questions