aroma
aroma

Reputation: 1421

What is python .get() method is doing in this scenario?

I was going through a tutorial for list data scraping from a web page and wehave a BeautifulSoup object named 'soup', I am supposed to find all the elements from 'soup' such that they are in a table and the element is in some class so they did this:

> [t["class"] for t in soup.find_all("table") if t.get("class")]

so I don't understand 2 things here, what is t["class"] doing in here why didn't we simply write t since the if condition is applied on the right why do we need to do t["class"] in the first place.

and why are we using .get() method as boolean in this case, I mean does it not return the value stored for a key in a dictionary?

Does it mean the beautiful soup object is a dictionary?

Upvotes: 2

Views: 85

Answers (3)

bruno desthuilliers
bruno desthuilliers

Reputation: 77912

"what is t["class"] doing in here why didn't we simply write t"*

Obviously because the author wanted to retrieve the class attribute of the tag, not the full tag.

why are we using .get() method as boolean in this case, I mean does it not return the value stored for a key in a dictionary?

dict.get(key[, default=None]) does indeed return the value for key key if it's set or default (which defaults to None) if it isn't.

The goal here is obviously to only get class for tags having one.

Does it mean the beautiful soup object is a dictionary?

Here 't' is not "the beautiful soup object', it's a Tag instance. And while not strictly being a dict, it does behave as a one wrt/ html attributes indeed. This is documented FWIW.

Upvotes: 2

Adam Smith
Adam Smith

Reputation: 54223

dict.get returns the value associated with the key it's given, or None. As an example:

>>> foo = {'spam': 'eggs'}
>>> foo.get('spam')
'eggs'
>>> foo['spam']
'eggs'
>>> foo.get('bar')
None
>>> foo['bar']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'bar'

I'm not familiar with BeautifulSoup, so it might be necessary in this case to do something like this, but typically you'd just check for membership before including

[t['class'] for t in soup.find_all('table') if 'class' in t]

Or more rarely use dict.get in the selector and filter out the None objects afterwards

tmp = [t.get('class') for t in soup.find_all('table')]
result = filter(tmp, None)
# this is equivalent to:
# result = [v for v in tmp if v]

Upvotes: 1

Wonka
Wonka

Reputation: 1901

Is an example of your tutorial, you probably wan't to get the text, not the class

I will write the list compreheision as "for" format:

result = []
tables = soup.find_all("table")
for t in tables:
    if t.get("class"): #Check if tables have class attribute
        result.append(t["class"]) #Probably you don't wan't the class name of the table, maybe you wan't the text

Upvotes: 0

Related Questions