Unable to retrieve links using beautiful soup and python

Question

I'm trying to extract all the url under the url : https://www.scotts.com/en-us/library/lawn-food

I have realized is that it does not returns few urls such as https://www.scotts.com/en-us/library/lawn-food/when-feed-greener-lawn and few more

I have mentioned below my code snippet:

import time
from random import randint
import requests
from bs4 import BeautifulSoup, SoupStrainer
import re

def scrape_google_summaries(url):
    time.sleep(randint(0, 2))  # relax and don't let google be angry
    r = requests.get(url)
    content = r.text

    soup = BeautifulSoup(content, "html.parser",parse_only=SoupStrainer('a', href=True))
    summary=[]
    for link in soup:#.find_all('a'):
        summary.append(link.get('href'))
        
    return summary

output = scrape_google_summaries("https://www.scotts.com/en-us/library/lawn-food")

Ajay Singh Rana · Accepted Answer

I checked by saving the r.text that is content to a local file and then i opened that in my browser and as expected all those article links that you are trying to scrape were not there..! Which means all those links are being dynamically generated.And beautifulSoup isn't considered for scraping dynamically generated website content.You will have to use some other tool like selenium or requests_html.

Unable to retrieve links using beautiful soup and python

Answers (2)

Related Questions