Web scraping: Unable to loop into div element with class to get text and URL

Question

I am trying to scrape a website where i am using div and class to get the contents inside it.

i am able to get the proper data but getting error when i put it inside the loop.

html = BeautifulSoup(response, 'html.parser')
post_list = html.find_all('div', class_='eodLhs')
print(post_list)
i = 0

for values in post_list:
     url_json = {'title': values.ul.li[i].a.text, 'url': values.ul.li[i].a['href']}
     names.append(values.ul.li[i].a.text)
i = i+1

Output of the print statement is: https://gist.github.com/parikhparth23/48669444506502f11409d43b30a4250d

It throws error at this line:

url_json = {'title': values.ul.li[i].a.text, 'url': values.ul.li[i].a['href']}

I want to get the text and URL after scraping.

QHarr · Accepted Answer

Based on your gist I think you can just use a css selector which ensures you have child hrefs within that parent class. In your existing code the i increment should happen in the loop but isn't needed if you re-write as I describe. Use a starts with operator for the attribute value to remove the share links as I suspect you only want the original links to content

for i in soup.select(".eodLhs [href^='/']"):
    print({i.text:i['href']})

Web scraping: Unable to loop into div element with class to get text and URL

Answers (1)

Related Questions