user2138489
user2138489

Reputation: 75

Extract header tags from the string in rubyonrails

Extract header tags and its id, text from string in ruby on rails. I tried Nokogiri::XML(content). I dont know how to extract header tags from it. Also header order from the string should not change. If I doc.css('h1').each do |h1|, it will return all h1 tags, so order will be changed. For example

<h1 id='h1'>Header1</h1>
<h3 id='h3'>Header3</h3>
<h2 id='h2'>Header2</h2>
<h5 id='h5'>Header5</h5>
<h6 id='h6'>Header6</h6>
<h3 id='h33'>Header3</h3>
<h4 id='h4'>Header4</h4>
<h2 id='h22'>Header2</h2>
<h6 id='h66'>Header6</h6>

result should be

headers = ["h1", "h3", "h2", "h5", "h6", "h3", "h4", "h2", "h6"]
toc = [{'node':'h1', 'value':'Header1', 'id':'h1' }, {'node':'h3', 'value':'Header3', 'id':'h3' }, {'node':'h2', 'value':'Header2', 'id':'h2' }, {'node':'h5', 'value':'Header5', 'id':'h5' }, {'node':'h6', 'value':'Header6', 'id':'h6' }, {'node':'h3', 'value':'Header3', 'id':'h33' }, {'node':'h4', 'value':'Header4', 'id':'h4' }, {'node':'h2', 'value':'Header2', 'id':'h22' }, {'node':'h6', 'value':'Header6', 'id':'h66' }]

My code:

doc = Nokogiri::XML(content)

Kindly help me on this to solve it.

Upvotes: 1

Views: 792

Answers (2)

Kevin Lab&#233;cot
Kevin Lab&#233;cot

Reputation: 1995

Shorter answer : simply call sort method will order your results as it appears in your source code.

heads = Nokogiri::HTML(object.body).css('h1, h2, h3, h4, h5, h6').sort()

Upvotes: 3

Arup Rakshit
Arup Rakshit

Reputation: 118271

I would do it as below :

html_string = <<-html
<h1 id='h1'>Header1</h1>
<h3 id='h3'>Header3</h3>
<h2 id='h2'>Header2</h2>
<h5 id='h5'>Header5</h5>
<h6 id='h6'>Header6</h6>
<h3 id='h33'>Header3</h3>
<h4 id='h4'>Header4</h4>
<h2 id='h22'>Header2</h2>
<h6 id='h66'>Header6</h6>
html

require 'nokogiri'

doc = Nokogiri::HTML(html_string)
# In the below line, I am first creating the array of elements to search 
# in the html document. You may call it also array of CSS rules.
header_tags = (1..6).map { |num| "h#{num}" }
# => ["h1", "h2", "h3", "h4", "h5", "h6"]
headers = []
toc = doc.css(*header_tags).map do |node|
  headers << node.name
  {'node' => node.name, 'value' => node.text, 'id' => node['id'] }
end

If you look at the method css(*rules) you would find :

Search this node for CSS rules. rules must be one or more CSS selectors.

Upvotes: 4

Related Questions