Reputation: 647
I want to scrape the Google translate website and get the translated text from it using Python 3.
Here is my code:
from bs4 import BeautifulSoup as soup
from urllib.request import Request as uReq
from urllib.request import urlopen as open
my_url = "https://translate.google.com/#en/es/I%20am%20Animikh%20Aich"
req = uReq(my_url, headers={'User-Agent':'Mozilla/5.0'})
uClient = open(req)
page_html = uClient.read()
uClient.close()
html = soup(page_html, 'html5lib')
print(html)
Unfortunately, I am unable to find the required information in the parsed Webpage. In chrome "Inspect", It is showing that the translated text is inside:
<span id="result_box" class="short_text" lang="es"><span class="">Yo soy Animikh Aich</span></span>
However, When I am searching for the information in the parsed HTML code, this is what I'm finding in it:
<span class="short_text" id="result_box"></span>
I have tried parsing using all of html5lib, lxml, html.parser. I have not been able to find a solution for this. Please help me with the issue.
Upvotes: 2
Views: 350
Reputation: 22440
Try like below to get the desired content:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://translate.google.com/#en/es/I%20am%20Animikh%20Aich")
soup = BeautifulSoup(driver.page_source, 'html5lib')
item = soup.select_one("#result_box span").text
print(item)
driver.quit()
Output:
Yo soy Animikh Aich
Upvotes: 1
Reputation: 7238
JavaScript is modifying the HTML code after it loads. urllib
can't handle JavaScript, you'll have to use Selenium
to get the data that you want.
For installation and demo, refer this link.
Upvotes: 1
Reputation: 3212
you could use a specific python api:
import goslate
gs = goslate.Goslate()
print(gs.translate('I am Animikh Aich', 'es'))
Yo soy Animikh Aich
Upvotes: 2