Reputation: 138
I wait to get the html web page from https://www.collinsdictionary.com/dictionary/english/supremacy, but part of the html file is loaded by javascript
. When I use HTTP.jl
to get the web page with HTTP.request()
, I only get part of the html file that loaded before the javascript
been run, so the web page I get is different to the web page I got from Chrome. How can I get the web page as same as Chrome get? Do I have to use WebDriver.jl with is a a wrapper around Selenium WebDriver's python bindings?
part of my source:
function get_page(w::word)::Bool
response = nothing
try
response = HTTP.request("GET", "https://www.collinsdictionary.com/dictionary/$(dictionary)/$(w.org_word)",
connect_timeout=connect_timeout, readtimeout=readtimeout, retries=retries, redirect=true,proxy=proxy)
catch e
push!(w.err_log, [get_page_http_err, string(e)])
return falses
end
open("./assets/org_page.html", "w") do f
write(f, String(response.body))
end
return true
end
dictionary
and w.org_word
are both String
, the function is in a module
.
Upvotes: 2
Views: 337
Reputation: 2301
What you want is impossible to achieve with just HTTP.jl
. Running the Javascript part of the page is fundamentally different -- you need a Javascript engine to do so, which is nothing simple.
And this is not a unique weakness of Julia's HTTP: Python requests.get(url) returning javascript code instead of the page html
(recently the standard library request
in python seems to added Javascript rendering ability)
Upvotes: 1