Paul
Paul

Reputation: 91

Extracting variable value from script with Nokogiri - Ruby/Rails

I have this code which pulls in all the scripts on a page.

full_url = URI.join(url, "/").to_s #https://www.example.com

doc = Nokogiri::HTML(URI.open(full_url))

doc.css('script').each do |script|
    puts script.content
end

This works great and returns all scripts on the page. However, that makes it more complicated than it needs to be as I only need 1 script, the one with the class "analytics"

<script class="analytics">
</script>

But I can't find a good way to isolate only that script by class, otherwise I have to loop through all the other scripts when I know the value I need is inside this one.

Now the 2nd issue I have is there is a bunch of functions and try/catches etc in the script. Within that I only need the value of these 2 lines:

window.TEST.gameName = "pop1";
window.TEST.gameVersion = "1.1.2";

So I just want to return the values "pop1" and "1.1.2"

There will only be 1 instance of window.TEST.gameName and window.TEST.gameVersion so they will be unique. Maybe I am overcomplicating it using Nokogiri and I should just use regex or would this way be quicker?

I am not tied to Nokogiri either, just that seemed like the most popular option.

I tried a few variations of using doc.at and doc.search but I just keep getting nothing back so I am probably doing it incorrectly.

Upvotes: 0

Views: 191

Answers (2)

Paul
Paul

Reputation: 91

To build on Schwern's question and answer the full question.

I used

doc.css('script.analytics').each do |script|
    @script = script.content
end

That gave me that script which was about a hundred lines of javascript, always in the same format, just different values.

I then did:

game_name = @script.match(/window.TEST.gameName = "(.*?)";/m)[1].strip # returns pop1
game_version = @script.match(/window.TEST.gameVersion = "(.*?)";/m)[1].strip # returns 1.1.2

There are more likely better ways to do it but that worked for me.

Upvotes: 0

Schwern
Schwern

Reputation: 164829

css will take a CSS selector. script.analytics finds script tags of the analytics class.

doc.css('script.analytics').each do |script|
    puts script.content
end

As to the second part, window.TEST.gameName = "pop1"; is Javascript. Nokogiri cannot help you. You'd need a Javascript parser.

Upvotes: 1

Related Questions