Reputation: 4827
I would like to scrape some quotes and authors but haven't found a way to separate the quote from the author during scraping.
import requests
from bs4 import BeautifulSoup
#url = 'https://www.goodreads.com/quotes'
#r = requests.get(url)
#soup = BeautifulSoup(r.content, 'html.parser')
html = """
<div class="quoteText">“Insanity is doing the same thing, over and over again, but expecting different results.” <br> ―
<span class="authorOrTitle">Narcotics Anonymous</span>
</div>
"""
soup = BeautifulSoup(html, 'html.parser')
quotes = soup.find_all('div', {'class': 'quoteText'})
for quote in quotes:
if quote.text is not None:
print(quote.text)
Upvotes: 0
Views: 865
Reputation: 52685
You can try to use stripped_strings
property:
for quote in quotes:
if quote.text is not None:
strings = [string for string in quote.stripped_strings]
quote_body = strings[0]
quote_author = strings[2]
print(quote_body)
print(quote_author)
Upvotes: 3
Reputation: 799
import requests
from bs4 import BeautifulSoup
#url = 'https://www.goodreads.com/quotes'
#r = requests.get(url)
#soup = BeautifulSoup(r.content, 'html.parser')
html = """
<div class="quoteText">“Insanity is doing the same thing, over and over again, but expecting different results.” <br> ―
<span class="authorOrTitle">Narcotics Anonymous</span>
</div>
"""
soup = BeautifulSoup(html, 'html.parser')
quotes = soup.find_all('div', {'class': 'quoteText'})
for quote in quotes:
if quote.text is not None:
quote_ = quote.text
quote_data = quote_.split(" ―")
quote_without_author = quote_data[0]
quote_author = quote_data[1]
print(quote_without_author.strip())
print(quote_author.strip())
You can split the data on ―, so is the [0] element your quote and [1] your author.
Output:
“Insanity is doing the same thing, over and over again, but expecting different results.”
Narcotics Anonymous
Upvotes: 0