Reputation: 91
I have this code which pulls in all the scripts on a page.
full_url = URI.join(url, "/").to_s #https://www.example.com
doc = Nokogiri::HTML(URI.open(full_url))
doc.css('script').each do |script|
puts script.content
end
This works great and returns all scripts on the page. However, that makes it more complicated than it needs to be as I only need 1 script, the one with the class "analytics"
<script class="analytics">
</script>
But I can't find a good way to isolate only that script by class, otherwise I have to loop through all the other scripts when I know the value I need is inside this one.
Now the 2nd issue I have is there is a bunch of functions and try/catches etc in the script. Within that I only need the value of these 2 lines:
window.TEST.gameName = "pop1";
window.TEST.gameVersion = "1.1.2";
So I just want to return the values "pop1" and "1.1.2"
There will only be 1 instance of window.TEST.gameName and window.TEST.gameVersion so they will be unique. Maybe I am overcomplicating it using Nokogiri and I should just use regex or would this way be quicker?
I am not tied to Nokogiri either, just that seemed like the most popular option.
I tried a few variations of using doc.at and doc.search but I just keep getting nothing back so I am probably doing it incorrectly.
Upvotes: 0
Views: 191
Reputation: 91
To build on Schwern's question and answer the full question.
I used
doc.css('script.analytics').each do |script|
@script = script.content
end
That gave me that script which was about a hundred lines of javascript, always in the same format, just different values.
I then did:
game_name = @script.match(/window.TEST.gameName = "(.*?)";/m)[1].strip # returns pop1
game_version = @script.match(/window.TEST.gameVersion = "(.*?)";/m)[1].strip # returns 1.1.2
There are more likely better ways to do it but that worked for me.
Upvotes: 0
Reputation: 164829
css
will take a CSS selector. script.analytics
finds script tags of the analytics class.
doc.css('script.analytics').each do |script|
puts script.content
end
As to the second part, window.TEST.gameName = "pop1";
is Javascript. Nokogiri cannot help you. You'd need a Javascript parser.
Upvotes: 1