Reputation: 379
I am trying to scrape a website where i am using div
and class
to get the contents inside it.
i am able to get the proper data but getting error when i put it inside the loop.
html = BeautifulSoup(response, 'html.parser')
post_list = html.find_all('div', class_='eodLhs')
print(post_list)
i = 0
for values in post_list:
url_json = {'title': values.ul.li[i].a.text, 'url': values.ul.li[i].a['href']}
names.append(values.ul.li[i].a.text)
i = i+1
Output of the print statement is: https://gist.github.com/parikhparth23/48669444506502f11409d43b30a4250d
It throws error at this line:
url_json = {'title': values.ul.li[i].a.text, 'url': values.ul.li[i].a['href']}
I want to get the text and URL after scraping.
Upvotes: 0
Views: 82
Reputation: 84465
Based on your gist I think you can just use a css selector which ensures you have child hrefs within that parent class. In your existing code the i increment should happen in the loop but isn't needed if you re-write as I describe. Use a starts with operator for the attribute value to remove the share links as I suspect you only want the original links to content
for i in soup.select(".eodLhs [href^='/']"):
print({i.text:i['href']})
Upvotes: 1