pnb1
pnb1

Reputation: 43

Parsing HTML with Nokogiri not all tags are present

There is this dictionary: Russian dictionary

In ruby I am trying to get the url of the next page - ">>" which is

<a href="m.exe?a=110&sc=4&recno=3506774&dict=&l1=1&l2=2">>></a>

When inspecting this element in browser, it is there and it is present. However, using

link = "http://www.multitran.ru/c/m.exe?a=110&sc=4&recno=3506179&dict=&l1=1&l2=2"
page = Nokogiri::HTML(open(link))
puts "#{page}"

The link to the next page is not printed. All the links to alphabet letters are there, but the there is no

<a href="m.exe?a=110&sc=4&recno=3506774&dict=&l1=1&l2=2">>></a>

Is this somehow dynamically generated and ruby doesn't catch it(?) The links of the "next pages" don't have any logical sequence, so I can't increment the url itself.. Any help appriciated

Upvotes: 1

Views: 101

Answers (1)

Joe Martinez
Joe Martinez

Reputation: 854

Your original guess was right. The page only includes the next link for specific user agents.

Try pretending to be Google Chrome like this:

page = Nokogiri::HTML(open(link, 'User-Agent' => 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'))

Upvotes: 2

Related Questions