Extracting texts after
with BeautifulSoup

Question

I have a series of web pages I want to scrape text from that all follow different patterns unfortunately. I'm trying to write a scraper that extracts text after tags, as that structure is common to all pages.

The pages follow three basic patterns as best I can tell:

As I have it now, I'm scraping with the following loop:

  for br in soup.find_all('br'):
        text = br.next_sibling

        try:         
            print text.strip().replace("	", " ").replace("
", " ").replace('
', ' ')
        except AttributeError:
            print('...')

While this script works for some pages, but only grabs some or none of the text for other ones. I've been tearing my hair out on this for the last few days, so any help would be greatly appreciated.

Also, I tried this technique already, but couldn't make it work for all the pages.

Extracting texts after <br> with BeautifulSoup

Answers (1)

Related Questions

Extracting texts after &lt;br&gt; with BeautifulSoup

Answers (1)

Related Questions

Extracting texts after <br> with BeautifulSoup