Reputation:
I want to scrape nutrient data from this page: http://www.mysupermarket.co.uk/tesco-price-comparison/Fruit/Tesco_Gala_Apple_Approx_160g.html.
I tried the following code.
import requests, bs4
res = requests.get('http://www.mysupermarket.co.uk/tesco-price-comparison/Fruit/Tesco_Gala_Apple_Approx_160g.html')
But the response's text does not match the HTML I see when I inspect the page with a browser.
So I can't use Beautiful Soup to search it.
How can I fix this?
Upvotes: 0
Views: 1258
Reputation: 571
This is the issue which arises due to different orientation of html tags when viewed on different browsers. This comes into picture due to different User Agent for each browser.
If you want to see same text as in browser then use Selenium Webdriver. Its is very easy and convenient to use it. Just once you are finished take the source code and use Beautiful Soup on that.
If you want to learn how to implement selenium check out here
Even facing problem then feel free to contact.
Upvotes: 0
Reputation: 302
A good alternative would be to use the newly released requests-HTML library by the same author of requests.
That way you can parse HTML as simple as this:
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://python.org/')
sel = 'body > div.application-main > div.jumbotron.jumbotron-codelines > div > div > div.col-md-7.text-center.text-md-left > p'
print(r.html.find(sel, first=True).text)
Check it out at the official site.
Thank you.
Upvotes: 2
Reputation: 8628
You need to retrieve the markup from the .text
attribute of the res
object. Your code should then read:
import requests, bs4
res = requests.get('http://www.mysupermarket.co.uk/tesco-price-comparison/Fruit/Tesco_Gala_Apple_Approx_160g.html')
html = res.text
Upvotes: 0