Reputation: 665
I'm trying to extract all the url under the url : https://www.scotts.com/en-us/library/lawn-food
I have realized is that it does not returns few urls such as https://www.scotts.com/en-us/library/lawn-food/when-feed-greener-lawn and few more
I have mentioned below my code snippet:
import time
from random import randint
import requests
from bs4 import BeautifulSoup, SoupStrainer
import re
def scrape_google_summaries(url):
time.sleep(randint(0, 2)) # relax and don't let google be angry
r = requests.get(url)
content = r.text
soup = BeautifulSoup(content, "html.parser",parse_only=SoupStrainer('a', href=True))
summary=[]
for link in soup:#.find_all('a'):
summary.append(link.get('href'))
return summary
output = scrape_google_summaries("https://www.scotts.com/en-us/library/lawn-food")
Upvotes: 0
Views: 84
Reputation: 105
I'd recommend using selenium and it's scroll down functionality.
More information here: https://stackoverflow.com/a/27760083/8623540
Upvotes: 0
Reputation: 573
I checked by saving the r.text
that is content
to a local file and then i opened that in my browser and as expected all those article links that you are trying to scrape were not there..! Which means all those links are being dynamically generated.And beautifulSoup isn't considered for scraping dynamically generated website content.You will have to use some other tool like selenium
or requests_html
.
Upvotes: 1