Reputation: 3
i am going to scrape dintex.net website in english laguage, but can't find any way to convert scraped data in English language. I also used googletans but it also shows error, so is there any other way to convert that page or data to English?
import scrapy
from googletrans import Translator
class DtSpider(scrapy.Spider):
name = 'dt'
start_urls = ['http://www.dintex.net']
def parse(self, response):
urls = response.xpath('//*[@class="listing-btn btn btn-primary btn-block w-100"]/@href').extract()
for url in urls:
url = response.urljoin(url)
yield scrapy.Request(url=url, callback=self.parse_details)
np = response.xpath('//*[@class="page-item"]/a[@rel="next"]/@href').extract_first()
ap = response.urljoin(np)
yield scrapy.Request(url=ap,callback=self.parse)
def parse_details(self,response):
Title = response.xpath('//*[@class="show-info__title"]/text()').extract_first()
Location = response.xpath('//*[@class="show-info__location"]/p/text()').extract_first()
Contact = response.xpath('//*[@class="show-info__contact-details__phone-link"]/text()').extract_first()
Contact = Contact.replace('Whatsapp ','')
Description = response.xpath('//*[@class="show-info__section-text"]/p/text()').extract_first()
Manufacture = response.xpath('//td[contains(text(),"Fabricante")]/following-sibling::td/text()').extract_first()
Model = response.xpath('//td[contains(text(),"Modelo")]/following-sibling::td/text()').extract_first()
Year = response.xpath('//td[contains(text(),"Año")]/following-sibling::td/text()').extract_first()
Condition = response.xpath('//td[contains(text(),"Condición")]/following-sibling::td/text()').extract_first()
img = response.xpath('//*[@class="gallery__item"]/img/@src').extract_first()
thumbs = response.xpath('//img/@lazy-src').extract()
#t = Translator()
#Title = t.translate(Title).text
#Location = t.translate(Location).text
#Contact = t.translate(Contact).text
#Description = t.translate(Description).text
#Manufacture = t.translate(Manufacture).text
#Model = t.translate(Model).text
#Year = t.translate(Year).text
#Condition = t.translate(Condition).text
yield{'Title': Title,
'Location' : Location,
'Contact' : Contact,
'Description' : Description,
'Manufacture' : Manufacture,
'Model' : Model,
'Year' : Year,
'Condition' : Condition,
'Img' : img,
'Thums' : thumbs
}
Upvotes: 0
Views: 2435
Reputation: 9185
I think you should send this cookie with your requests
googtrans=/es/en
As the page allows for localisation depending on selection of the available langauage/region.
You would need to do something like this see cookie part from the scrapy request from scrapy docs
The request you are yielding might need to change something like this(not tested)
scrapy.Request(url=url, cookies= {'googletrans': '/es/en'}, callback=self.parse_details)
Upvotes: 4