user3646037
user3646037

Reputation: 293

Parse HTML with Nokogiri

I have an html document that I need to scrape for certain strings. The document is a youtube playlist. For example:

require 'open-uri'
doc = Nokogiri::HTML(open("https://www.youtube.com/playlist?list=PL11CE9468C379D2C8"))

When I view the HTML source code I can see the string I want.

<tr class="pl-video yt-uix-tile " data-title="Tyler The Creator - Yonkers" data-video-id="XSbZidsgMfw"

The string is what follows data-video-id in quotations. In this playlist there are 7 videos so there are 7 samples of this code, each with a different data-video-id. How can I loop through and save each of these strings to a @scraped_id variable?

The id is saved using

 @video = @stream.videos.find_or_initialize_by(url: @scraped_id)
 @video.save

Upvotes: 1

Views: 1369

Answers (1)

Edd Morgan
Edd Morgan

Reputation: 2923

You can use a CSS selector to pick out all elements that have a data-video-id attribute, and then take the value of that attribute.

doc.css("[data-video-id]").each do |el|
    @scraped_id = el.attr('data-video-id')
    @video = @stream.videos.find_or_initialize_by(url: @scraped_id)
    @video.save
end

Upvotes: 1

Related Questions