Reputation: 4248
I'm trying to web scrape this page for fun.
The script is working fine, but the names of some movies are translated into Romanian (for example, "Beauty and the Beast" is "Frumoasa si Bestia").
I'm guessing that the server is sending me the requested content depending on my IP.
However, in my browser I see only English names, no matter if I use my IP or activate a VPN through a browser's extension. It's probably because the browser's language is set to English and translate option is off.
My question is this: how to get all the names in English?
Can I specify some parameter in my GET
request to do that?
import requests
page = requests.get(some_URL)
I was also thinking about using a server VPN (not just a browser extension), but I'm running on Lubuntu and there seems to be a lot headache in installing a free VPN (accounts to be made etc.).
If it helps, I use Jupyter Notebook to code.
Upvotes: 2
Views: 2374
Reputation: 51655
I guess this site is serving pages based on browser language. Try to set it on requests:
import requests
url = r"http://www.imdb.com/search/title?release_date=2017&page=1&ref_=adv_nxt"
headers = {"Accept-Language": "en-US,en;q=0.5"}
r = requests.get(url, headers=headers)
By the way. Check imdb web scrape's policy.
Upvotes: 8