y.bregey
y.bregey

Reputation: 1499

Extract value of `href` atributes using Nokogiri

When using Nokogiri to parse HTML and selecting a elements with class="favorite":

galleries = doc.css(".favourite a")
#doc variable contains return of Nokogiri::HTML(source_page.body)

puts galleries returns:

<a href="/galleries/6730">...</a>
<a href="/favourites/40565414">...</a>
<a href="/galleries/10851">...</a>
<a href="/favourites/40850848">...</a>

How can I extract only /galleries/[0-9]+ values of href attribute?

Upvotes: 0

Views: 203

Answers (2)

Phrogz
Phrogz

Reputation: 303253

Using more Ruby and less XPath

doc.css('.favourite a').map{ |a| a['href'][%r{galleries/\d+}] }.compact

Upvotes: 1

Jiř&#237; Posp&#237;šil
Jiř&#237; Posp&#237;šil

Reputation: 14402

galleries.xpath("@href[contains(., 'galleries')]").map(&:value)
# => ["/galleries/6730", "/galleries/10851"]

Upvotes: 1

Related Questions