xinyu
xinyu

Reputation: 138

How to do http request to get the whole source page when part of html loaded by javascript?

I wait to get the html web page from https://www.collinsdictionary.com/dictionary/english/supremacy, but part of the html file is loaded by javascript. When I use HTTP.jl to get the web page with HTTP.request(), I only get part of the html file that loaded before the javascript been run, so the web page I get is different to the web page I got from Chrome. How can I get the web page as same as Chrome get? Do I have to use WebDriver.jl with is a a wrapper around Selenium WebDriver's python bindings?

part of my source:

function get_page(w::word)::Bool
    response = nothing
    try
        response = HTTP.request("GET", "https://www.collinsdictionary.com/dictionary/$(dictionary)/$(w.org_word)",
                                                 connect_timeout=connect_timeout, readtimeout=readtimeout, retries=retries, redirect=true,proxy=proxy)
    catch e
        push!(w.err_log, [get_page_http_err, string(e)])
        return falses
    end
    open("./assets/org_page.html", "w") do f 
        write(f, String(response.body))
    end
    return true
end

dictionary and w.org_word are both String, the function is in a module.

Upvotes: 2

Views: 337

Answers (1)

jling
jling

Reputation: 2301

What you want is impossible to achieve with just HTTP.jl. Running the Javascript part of the page is fundamentally different -- you need a Javascript engine to do so, which is nothing simple.

And this is not a unique weakness of Julia's HTTP: Python requests.get(url) returning javascript code instead of the page html

(recently the standard library request in python seems to added Javascript rendering ability)

Upvotes: 1

Related Questions