Reputation: 443
I am trying to scrape multiple web pages across the world. So, I want to translate the website using Google translate extension and then scrape the page using selenium.
I did some research and figured out how to add extension while running selenium.
1) download google translate extension
but I have no idea how to automatically execute the extension (By default, it does nothing)
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
option = webdriver.ChromeOptions()
option.add_extension('./translate.crx')
driver = webdriver.Chrome(executable_path = "./chromedriver", chrome_options = option)
driver.get("naver.com")
WebDriverWait(driver, 3).until(EC.presence_of_element_located((By.TAG_NAME, "body")))
''' @@@@ Here I want something like@@@@
driver.execute_extension("translate this page")
'''
print driver.find_element_by_tag_name("body").text
driver.quit()
Also, I found that the extension doesn't translate the original HTML, so I might have to use a different method for crawling. (Maybe passing ctrl-a, ctrl-c, ctrl-v instead by_tag_name("body"))
Could you give me any pointer for this?
Thanks in advance
Upvotes: 1
Views: 4339
Reputation: 6267
driver.execute_extension
Seems to me if you can open the extension by Selenium (see an example in C#). Then you by Selenium may click on the TRANSLATE THIS PAGE link:
Use Google Translate API.
Upvotes: 1