Yagnesh
Yagnesh

Reputation: 3

is there any way to translate web page language, or translating scraped data while scraping using scrapy?

i am going to scrape dintex.net website in english laguage, but can't find any way to convert scraped data in English language. I also used googletans but it also shows error, so is there any other way to convert that page or data to English?

import scrapy
from googletrans import Translator
class DtSpider(scrapy.Spider):
name = 'dt'
start_urls = ['http://www.dintex.net']

def parse(self, response):
    urls = response.xpath('//*[@class="listing-btn btn btn-primary btn-block w-100"]/@href').extract()
    for url in urls:
        url = response.urljoin(url)
        yield scrapy.Request(url=url, callback=self.parse_details)

    np = response.xpath('//*[@class="page-item"]/a[@rel="next"]/@href').extract_first()
    ap = response.urljoin(np)
    yield scrapy.Request(url=ap,callback=self.parse)

def parse_details(self,response):
    Title = response.xpath('//*[@class="show-info__title"]/text()').extract_first()
    Location = response.xpath('//*[@class="show-info__location"]/p/text()').extract_first()
    Contact = response.xpath('//*[@class="show-info__contact-details__phone-link"]/text()').extract_first()
    Contact = Contact.replace('Whatsapp ','')
    Description = response.xpath('//*[@class="show-info__section-text"]/p/text()').extract_first()
    Manufacture = response.xpath('//td[contains(text(),"Fabricante")]/following-sibling::td/text()').extract_first()
    Model = response.xpath('//td[contains(text(),"Modelo")]/following-sibling::td/text()').extract_first()
    Year = response.xpath('//td[contains(text(),"Año")]/following-sibling::td/text()').extract_first()
    Condition = response.xpath('//td[contains(text(),"Condición")]/following-sibling::td/text()').extract_first()
    img = response.xpath('//*[@class="gallery__item"]/img/@src').extract_first()
    thumbs =  response.xpath('//img/@lazy-src').extract()

    #t = Translator()
    #Title = t.translate(Title).text
    #Location = t.translate(Location).text
    #Contact = t.translate(Contact).text
    #Description = t.translate(Description).text
    #Manufacture = t.translate(Manufacture).text
    #Model = t.translate(Model).text
    #Year = t.translate(Year).text
    #Condition = t.translate(Condition).text

    yield{'Title': Title,
    'Location' : Location,
    'Contact' : Contact,
    'Description' : Description,
    'Manufacture' : Manufacture,
    'Model' : Model,
    'Year' : Year,
    'Condition' : Condition,
    'Img' : img,
    'Thums' : thumbs
    }

Upvotes: 0

Views: 2435

Answers (1)

Biswanath
Biswanath

Reputation: 9185

I think you should send this cookie with your requests

googtrans=/es/en

As the page allows for localisation depending on selection of the available langauage/region.

You would need to do something like this see cookie part from the scrapy request from scrapy docs

The request you are yielding might need to change something like this(not tested)

scrapy.Request(url=url, cookies= {'googletrans': '/es/en'}, callback=self.parse_details)

Upvotes: 4

Related Questions