Reputation: 1421
I was going through a tutorial for list data scraping from a web page and wehave a BeautifulSoup object named 'soup', I am supposed to find all the elements from 'soup' such that they are in a table and the element is in some class so they did this:
> [t["class"] for t in soup.find_all("table") if t.get("class")]
so I don't understand 2 things here, what is t["class"]
doing in here why didn't we simply write t
since the if condition is applied on the right why do we need to do t["class"]
in the first place.
and why are we using .get() method as boolean in this case, I mean does it not return the value stored for a key in a dictionary?
Does it mean the beautiful soup object is a dictionary?
Upvotes: 2
Views: 85
Reputation: 77912
"what is t["class"] doing in here why didn't we simply write t"*
Obviously because the author wanted to retrieve the class
attribute of the tag, not the full tag.
why are we using .get() method as boolean in this case, I mean does it not return the value stored for a key in a dictionary?
dict.get(key[, default=None])
does indeed return the value for key key
if it's set or default
(which defaults to None
) if it isn't.
The goal here is obviously to only get class
for tags having one.
Does it mean the beautiful soup object is a dictionary?
Here 't' is not "the beautiful soup object', it's a Tag
instance. And while not strictly being a dict
, it does behave as a one wrt/ html attributes indeed. This is documented FWIW.
Upvotes: 2
Reputation: 54223
dict.get
returns the value associated with the key it's given, or None
. As an example:
>>> foo = {'spam': 'eggs'}
>>> foo.get('spam')
'eggs'
>>> foo['spam']
'eggs'
>>> foo.get('bar')
None
>>> foo['bar']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'bar'
I'm not familiar with BeautifulSoup, so it might be necessary in this case to do something like this, but typically you'd just check for membership before including
[t['class'] for t in soup.find_all('table') if 'class' in t]
Or more rarely use dict.get
in the selector and filter out the None
objects afterwards
tmp = [t.get('class') for t in soup.find_all('table')]
result = filter(tmp, None)
# this is equivalent to:
# result = [v for v in tmp if v]
Upvotes: 1
Reputation: 1901
Is an example of your tutorial, you probably wan't to get the text, not the class
I will write the list compreheision as "for" format:
result = []
tables = soup.find_all("table")
for t in tables:
if t.get("class"): #Check if tables have class attribute
result.append(t["class"]) #Probably you don't wan't the class name of the table, maybe you wan't the text
Upvotes: 0