Reputation: 209
I wish to automate a process using Mechanize to crawl some web pages and save information.
The page is look book north america .
I wish to iterate through the ul id="looks"
and, inside that iteration, click on every user inside the looks. So the element would look something like this:
<a href="/luciamouet" data-page-track="user name click" data-track="user name click | byline" target="_blank" title="Lucia Mouet">Lucia M.</a>
I wish to go to each user and store some information from that page.
This is what I have so far but I'm stumped on how to iterate and follow the link for each user:
require 'rubygems'
require 'mechanize'
require 'nokogiri'
require 'open-uri'
agent = Mechanize.new
page = agent.get('http://lookbook.nu/north-america')
looks = page.parser.css('#looks p')
looks.each do |x|
puts x
end
Upvotes: 0
Views: 547
Reputation: 54984
Rather than mess around with base + path as suggested by @radubogdan, just use page.uri:
page.search('#looks h1 a').each do |a|
url = page.uri.merge a[:href]
page2 = agent.get url
puts page2.title
end
Upvotes: 1
Reputation: 2834
You have everything to construct the detail page URL. Grab the relative URL (I will call it path) append the base URL and make a new request.
require 'mechanize'
agent = Mechanize.new
agent.pluggable_parser.default = Mechanize::Page
base = 'http://lookbook.nu'
page = agent.get(base + '/north-america')
detail_pages = page.search("//div[contains(@class, 'look_meta_container')]/p/a[1]/@href").map(&:text)
# ["/user/1069907-Veronica-P", "/elliott_alexzander", "/neno", "/skirtsofurban", "/tovogueorbust", "/dthutt", "/ryapie", "/lovebetweentheracks", "/lonleyboy", "/bobbyraffin", "/tsangtastic", "/user/737385-Katia-H"]
detail_pages.each do |path|
page = agent.get(base + path)
name = page.search("//div[@id='userheader']//h1/a").text
fans = page.search("//span[contains(text(), 'Fans')]/../span[1]").text
puts name + " have " + fans + " fans"
end
=>
Veronica P have 26,044 fans
Elliott Alexzander have 3,409 fans
Neno Neno have 15,304 fans
Laura P have 975 fans
Alexandra G. have 620 fans
Dayeanne Hutton have 336 fans
Mariah Alysz have 288 fans
Lina Dinh have 11,675 fans
Talal Amine have 882 fans
Bobby Raffin have 72,469 fans
Jenny Tsang have 8,909 fans
Katia H. have 282 fans
Note: I used #pluggable_parser.default
in order to get a Mechanize::Page
response. Usually you don't need that but they didn't setup content-type correctly.
Upvotes: 1