Alex Olteanu
Alex Olteanu

Reputation: 4248

I get translated text when I do a GET request (in Python). How to get English content?

I'm trying to web scrape this page for fun.

The script is working fine, but the names of some movies are translated into Romanian (for example, "Beauty and the Beast" is "Frumoasa si Bestia").

I'm guessing that the server is sending me the requested content depending on my IP.

However, in my browser I see only English names, no matter if I use my IP or activate a VPN through a browser's extension. It's probably because the browser's language is set to English and translate option is off.

My question is this: how to get all the names in English?

Can I specify some parameter in my GET request to do that?

import requests
page = requests.get(some_URL)

I was also thinking about using a server VPN (not just a browser extension), but I'm running on Lubuntu and there seems to be a lot headache in installing a free VPN (accounts to be made etc.).

If it helps, I use Jupyter Notebook to code.

Upvotes: 2

Views: 2374

Answers (1)

dani herrera
dani herrera

Reputation: 51655

I guess this site is serving pages based on browser language. Try to set it on requests:

import requests

url = r"http://www.imdb.com/search/title?release_date=2017&page=1&ref_=adv_nxt"
headers = {"Accept-Language": "en-US,en;q=0.5"}
r = requests.get(url, headers=headers)

By the way. Check imdb web scrape's policy.

Upvotes: 8

Related Questions