Reputation: 409
I'm using nokogiri to scrape web pages. The structure of the page is made of an unordered list containing multiple list items each of which has a link, an image and text, all contained in a div.
I'm trying to find clean way to extract the elements in each list item so I can have each li contained in an array or hash like so:
li[0] = ['Acme co 1', 'image1.png', 'Customer 1 details']
li[1] = ['Acme co 2', 'image2.png', 'Customer 2 details']
At the moment I get all the elements in one go then store them in separate arrays. Is there a better, more idiomatic way of doing this?
This is the code atm:
data = Nokogiri::HTML(html)
images = []
name = []
data.css('ul li img').each {|l| images << l}
data.css('ul li a').each {|a| names << a.text }
This is the html I'm working from:
<ul class="customers">
<li>
<div>
<a href='#' class="company-name"> Acme co 1 </a>
<div class="customer-image">
<img src="image1.png"/>
</div>
<div class=" customer-description">
Cusomter 1 details
</div>
</div>
</li>
<li>
<div>
<a href='#' class="company-name"> Acme co 2</a>
<div class="customer-image">
<img src="image1.png"/>
</div>
<div class=" customer-description">
Customer 2 details
</div>
</div>
</li>
</ul>
Thanks
Upvotes: 1
Views: 310
Reputation: 5213
Assuming the code you have is giving you what you want, I wouldn't try to rewrite anything significant. You can be more brief and idiomatic by replacing your #each
methods with #map
:
data = Nokogiri::HTML(html)
images = data.css('ul li img')
names = data.css('ul li a').map(&:text)
Upvotes: 2
Reputation: 28285
data = Nokogiri::HTML(html)
images = data.css('ul li img')
names = data.css('ul li a').map(&:text)
This simplifies your code slightly, but your original version wasn't too bad.
And my simplification may not generalise if you are, for example, scraping images from multiple regions on the page! In which case, reverting back to something like your original may be fine.
Upvotes: 1