Reputation: 1140
I just starting using BeautifulSoup and I am encountering a problem. I set up a html snippet below and make a BeautifulSoup object:
html_snippet = '<p class="course"><span class="text84">Ae 100. Research in Aerospace. </span><span class="text85">Units to be arranged in accordance with work accomplished. </span><span class="text83">Open to suitably qualified undergraduates and first-year graduate students under the direction of the staff. Credit is based on the satisfactory completion of a substantive research report, which must be approved by the Ae 100 adviser and by the option representative. </span> </p>'
subject = BeautifulSoup(html_snippet)
I have tried doing several find and find_all operations like below but all I am getting is nothing or an empty list:
subject.find(text = 'A')
subject.find(text = 'Research')
subject.next_element.find('A')
subject.find_all(text = 'A')
When I created the BeautifulSoup object from a html file on my computer before, the find and find_all operations were all working fine. However, when I pulled the html_snippet from reading a webpage online through urllib2, I am getting problems.
Can anyone point out where the issue is?
Upvotes: 2
Views: 1704
Reputation: 10574
Pass the argument like this:
import re
subject.find(text=re.compile('A'))
The default behavior for the text
filter is to match on the entire body. Passing in a regular expression lets you match on fragments.
EDIT: To match only bodies beginning with A
, you can use the following:
subject.find(text=re.compile('^A'))
To match only bodies containing words that begin with A
, you can use:
subject.find_all(text = re.compile(r'\bA'))
It's difficult to tell more specifically what you're looking for, let me know if I've misinterpreted what you're asking.
Upvotes: 4