tpstackexchange
tpstackexchange

Reputation: 23

Scrapy response is a different language from request and resposne url

I'm trying to scrape search results from this page

http://eur-lex.europa.eu/search.html?qid=1437402891621&DB_TYPE_OF_ACT=advGeneral&CASE_LAW_SUMMARY=false&DTS_DOM=EU_LAW&typeOfActStatus=ADV_GENERAL&type=advanced&lang=fr&SUBDOM_INIT=EU_CASE_LAW&DTS_SUBDOM=EU_CASE_LAW

The language according to the url is french, and that is what I see in the scrapy shell, following 'crawled (200) '

If I try response.url I also get a url with lang=fr.

Viewing the page in a browser shows me french results.

However, the body of the response is English.

I've tried disabling cookies in my scrapy settings.py file. I've also set the DEFAULT_REQUEST HEADERS to 'Accept-Language': 'fr'.

Any ideas?

Upvotes: 2

Views: 2421

Answers (1)

Frank Martin
Frank Martin

Reputation: 2594

In the upper right corner of the webpage there's a drop down field to choose the language of the website. Selecting french there will add another parameter to the url: &locale=fr.

So - add that parameter to your start_url.

Upvotes: 1

Related Questions