René
René

Reputation: 4827

BeautifulSoup, select text to extract

I would like to scrape some quotes and authors but haven't found a way to separate the quote from the author during scraping.

import requests
from bs4 import BeautifulSoup

#url = 'https://www.goodreads.com/quotes'
#r = requests.get(url)
#soup = BeautifulSoup(r.content, 'html.parser')

html = """
       <div class="quoteText">&ldquo;Insanity is doing the same thing, over and over again, but expecting different results.&rdquo; <br>  &#8213;
       <span class="authorOrTitle">Narcotics Anonymous</span>
       </div>
"""

soup = BeautifulSoup(html, 'html.parser')

quotes = soup.find_all('div', {'class': 'quoteText'})

for quote in quotes:
    if quote.text is not None:
        print(quote.text)

Upvotes: 0

Views: 865

Answers (2)

Andersson
Andersson

Reputation: 52685

You can try to use stripped_strings property:

for quote in quotes:
    if quote.text is not None:
        strings = [string for string in quote.stripped_strings]
        quote_body = strings[0]
        quote_author = strings[2]
        print(quote_body) 
        print(quote_author)

Upvotes: 3

madik_atma
madik_atma

Reputation: 799

import requests
from bs4 import BeautifulSoup

#url = 'https://www.goodreads.com/quotes'
#r = requests.get(url)
#soup = BeautifulSoup(r.content, 'html.parser')

html = """
       <div class="quoteText">&ldquo;Insanity is doing the same thing, over and over again, but expecting different results.&rdquo; <br>  &#8213;
       <span class="authorOrTitle">Narcotics Anonymous</span>
       </div>
"""

soup = BeautifulSoup(html, 'html.parser')

quotes = soup.find_all('div', {'class': 'quoteText'})

for quote in quotes:
    if quote.text is not None:
        quote_ = quote.text
        quote_data = quote_.split(" ―")
        quote_without_author = quote_data[0]
        quote_author = quote_data[1]
        print(quote_without_author.strip())
        print(quote_author.strip())

You can split the data on ―, so is the [0] element your quote and [1] your author.

Output:

“Insanity is doing the same thing, over and over again, but expecting different results.”
Narcotics Anonymous

Upvotes: 0

Related Questions