Reputation: 1147
Let's say we have HTML like this (sorry, I don't know how to copy and paste page info and this is on an intranet):
And I want to get the highlighted portion for all of the questions (this is like a Stack Overflow page). EDIT: to be clearer, what I am interested in is getting a list that has:
['question-summary-39968',
'question-summary-40219',
'question-summary-42899',
'question-summary-34348',
'question-summary-32497',
'question-summary-35308',
...]
Now I know that a working solution is a list comprehension where I could do:
[item["id"] for item in html_df.find_all(class_="question-summary")]
But this is not exactly what I want. How can I directly access question-summary-41823
for the first item?
Also, what is the difference between soup.select
and soup.get
?
Upvotes: 0
Views: 59
Reputation: 1147
I thought I would post my answer here if it helps others.
What I am trying to do is access the id
attribute within the question-summary
class.
Now you can do something like this and obtain it for only the first item (object?):
html_df.find(class_="question-summary")["id"]
But you want it for all of them. So you could do this to get the class data:
html_df.select('.question-summary')
But you can't just do
html_df.select('.question-summary')["id"]
Because you have a list filled with bs4.elements
. So you need to iterate over the list and select just the piece that you want. You could do a for
loop but a more elegant way is to just use list comprehension:
[item["id"] for item in html_df.find_all(class_="question-summary")]
Breaking down what this does, it:
question-summary
objects from the soupitem
id
attribute and adds it to the listAlternatively you can use select
:
[item["id"] for item in html_df.find_all(class_="question-summary")]
I prefer the first version because it's more explicit, but either one results in:
['question-summary-43960',
'question-summary-43953',
'question-summary-43959',
'question-summary-43947',
'question-summary-43952',
'question-summary-43945',
...]
Upvotes: 1