Parsing HTML with Nokogiri not all tags are present

Question

There is this dictionary: Russian dictionary

In ruby I am trying to get the url of the next page - ">>" which is

>>

When inspecting this element in browser, it is there and it is present. However, using

link = "http://www.multitran.ru/c/m.exe?a=110&sc=4&recno=3506179&dict=&l1=1&l2=2"
page = Nokogiri::HTML(open(link))
puts "#{page}"

The link to the next page is not printed. All the links to alphabet letters are there, but the there is no

>>

Is this somehow dynamically generated and ruby doesn't catch it(?) The links of the "next pages" don't have any logical sequence, so I can't increment the url itself.. Any help appriciated

Joe Martinez · Accepted Answer

Your original guess was right. The page only includes the next link for specific user agents.

Try pretending to be Google Chrome like this:

page = Nokogiri::HTML(open(link, 'User-Agent' => 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'))

Parsing HTML with Nokogiri not all tags are present

Answers (1)

Related Questions