Reputation: 3663
How do I write a Mechanize scraper to get the content from every HTML tag on a web page? Or do I need to convert the page to a string and use regex to get all the content between \<.*?\>
and \<\/.*?\>
?
Upvotes: 0
Views: 408
Reputation: 671
To find more information regarding writing a web scraper with Mechanize take a look at the following tutorials:
Also keep in mind that mechanize uses the Nokogiri gem to do its underlying scraping. If you are not attached to Mechanize consider just using Nokogiri to parse the HTML tags.
Do not convert the page to a string and use regex to get the HTML content. See this answer for more information on why that is a bad idea.
As @pguardiario mentioned in the comment below, the code to get all the content for each tag is page.search(*).map &:text
Upvotes: 2
Reputation: 66
Do you limited only to mechanize? Maybe, you can try to use watir or pure selenium to get web page with all tags in one object.
Upvotes: 1