huzefausama
huzefausama

Reputation: 443

Web-scraping with selenium using google translate

I am trying to scrape multiple web pages across the world. So, I want to translate the website using Google translate extension and then scrape the page using selenium.

I did some research and figured out how to add extension while running selenium.

1) download google translate extension

2) Create .crx file

3) add extension to selenium

but I have no idea how to automatically execute the extension (By default, it does nothing)

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

option = webdriver.ChromeOptions()
option.add_extension('./translate.crx')
driver = webdriver.Chrome(executable_path = "./chromedriver", chrome_options = option)
driver.get("naver.com")
WebDriverWait(driver, 3).until(EC.presence_of_element_located((By.TAG_NAME, "body")))

''' @@@@ Here I want something like@@@@
driver.execute_extension("translate this page")
'''

print driver.find_element_by_tag_name("body").text
driver.quit()

Also, I found that the extension doesn't translate the original HTML, so I might have to use a different method for crawling. (Maybe passing ctrl-a, ctrl-c, ctrl-v instead by_tag_name("body"))

Could you give me any pointer for this?

Thanks in advance

Upvotes: 1

Views: 4339

Answers (1)

Igor Savinkin
Igor Savinkin

Reputation: 6267

driver.execute_extension

Seems to me if you can open the extension by Selenium (see an example in C#). Then you by Selenium may click on the TRANSLATE THIS PAGE link:

enter image description here

Shortcut

Use Google Translate API.

Upvotes: 1

Related Questions