Reputation: 75
Extract header tags and its id, text from string in ruby on rails. I tried Nokogiri::XML(content). I dont know how to extract header tags from it. Also header order from the string should not change. If I doc.css('h1').each do |h1|, it will return all h1 tags, so order will be changed. For example
<h1 id='h1'>Header1</h1>
<h3 id='h3'>Header3</h3>
<h2 id='h2'>Header2</h2>
<h5 id='h5'>Header5</h5>
<h6 id='h6'>Header6</h6>
<h3 id='h33'>Header3</h3>
<h4 id='h4'>Header4</h4>
<h2 id='h22'>Header2</h2>
<h6 id='h66'>Header6</h6>
result should be
headers = ["h1", "h3", "h2", "h5", "h6", "h3", "h4", "h2", "h6"]
toc = [{'node':'h1', 'value':'Header1', 'id':'h1' }, {'node':'h3', 'value':'Header3', 'id':'h3' }, {'node':'h2', 'value':'Header2', 'id':'h2' }, {'node':'h5', 'value':'Header5', 'id':'h5' }, {'node':'h6', 'value':'Header6', 'id':'h6' }, {'node':'h3', 'value':'Header3', 'id':'h33' }, {'node':'h4', 'value':'Header4', 'id':'h4' }, {'node':'h2', 'value':'Header2', 'id':'h22' }, {'node':'h6', 'value':'Header6', 'id':'h66' }]
My code:
doc = Nokogiri::XML(content)
Kindly help me on this to solve it.
Upvotes: 1
Views: 792
Reputation: 1995
Shorter answer : simply call sort method will order your results as it appears in your source code.
heads = Nokogiri::HTML(object.body).css('h1, h2, h3, h4, h5, h6').sort()
Upvotes: 3
Reputation: 118271
I would do it as below :
html_string = <<-html
<h1 id='h1'>Header1</h1>
<h3 id='h3'>Header3</h3>
<h2 id='h2'>Header2</h2>
<h5 id='h5'>Header5</h5>
<h6 id='h6'>Header6</h6>
<h3 id='h33'>Header3</h3>
<h4 id='h4'>Header4</h4>
<h2 id='h22'>Header2</h2>
<h6 id='h66'>Header6</h6>
html
require 'nokogiri'
doc = Nokogiri::HTML(html_string)
# In the below line, I am first creating the array of elements to search
# in the html document. You may call it also array of CSS rules.
header_tags = (1..6).map { |num| "h#{num}" }
# => ["h1", "h2", "h3", "h4", "h5", "h6"]
headers = []
toc = doc.css(*header_tags).map do |node|
headers << node.name
{'node' => node.name, 'value' => node.text, 'id' => node['id'] }
end
If you look at the method css(*rules)
you would find :
Search this node for CSS rules. rules must be one or more CSS selectors.
Upvotes: 4