user3928006
user3928006

Reputation: 93

How can I render JavaScript HTML to HTML in python?

I have looked around and only found solutions that render a URL to HTML. However I need a way to be able to render a webpage (That I already have, and that has JavaScript) to proper HTML.

Want: Webpage (with JavaScript) ---> HTML

Not: URL --> Webpage (with JavaScript) ---> HTML

I couldn't figure out how to make the other code work the way I wanted.

This is the code I was using that renders URLs: http://webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/

For clarity, the code above takes a URL of a webpage that has some parts of the page rendered by JavaScript, so if I scrape the page normally using say urllib2 then I won't get all the links etc that are rendered as after the JavaScript.

However I want to be able to scrape a page, say again with urllib2, and then render that page and get the outcome HTML. (Different to the above code since it takes a URL as it's argument.

Any help is appreciated, thanks guys :)

Upvotes: 9

Views: 30686

Answers (3)

vv2006-mc
vv2006-mc

Reputation: 107

The module I use for doing so is request_html. The first time used it automatically downloads a chromium browser, then you can render any webpage(with JavaScript)

requests_html also supports html parsing.

basically an alternative for selenium

example:

from requests_html import HTMLSession

session = HTMLSession()

r = session.get(URL)

r.html.render() # you can use r.html.render(sleep=1) if you want


Upvotes: 4

peter wambua
peter wambua

Reputation: 1

try webdriver.Firefox().get('url')

Upvotes: -1

barak manos
barak manos

Reputation: 30146

You can pip install selenium from a command line, and then run something like:

from selenium import webdriver
from urllib2 import urlopen

url = 'http://www.google.com'
file_name = 'C:/Users/Desktop/test.txt'

conn = urlopen(url)
data = conn.read()
conn.close()

file = open(file_name,'wt')
file.write(data)
file.close()

browser = webdriver.Firefox()
browser.get('file:///'+file_name)
html = browser.page_source
browser.quit()

Upvotes: 13

Related Questions