Username
Username

Reputation: 3663

How do I read the content of every HTML tag using Mechanize?

How do I write a Mechanize scraper to get the content from every HTML tag on a web page? Or do I need to convert the page to a string and use regex to get all the content between \<.*?\> and \<\/.*?\>?

Upvotes: 0

Views: 408

Answers (2)

2016rshah
2016rshah

Reputation: 671

To find more information regarding writing a web scraper with Mechanize take a look at the following tutorials:

Also keep in mind that mechanize uses the Nokogiri gem to do its underlying scraping. If you are not attached to Mechanize consider just using Nokogiri to parse the HTML tags.

Do not convert the page to a string and use regex to get the HTML content. See this answer for more information on why that is a bad idea.

--Edit--

As @pguardiario mentioned in the comment below, the code to get all the content for each tag is page.search(*).map &:text

Upvotes: 2

Victor Ch.
Victor Ch.

Reputation: 66

Do you limited only to mechanize? Maybe, you can try to use watir or pure selenium to get web page with all tags in one object.

Upvotes: 1

Related Questions