Martin
Martin

Reputation: 144

How to remove specific data line from html file with Ruby

i have a file name payment.html.erb

content have:

<form method="POST" action="http://www.example.com" id="my_id" class="form">
 <input type="hidden" name="Timestamp" value="2013-09-29T08:05:14.Z"/>
 <input type="hidden" name="Signature" value="dd01adafd2689b243d6cbc9088da2bf699976eb0"/>
 <input type="hidden" name="Amount" value="1"/>
<input type="text" name="AccountName" value="" placeholder="account name"/>
<p></p>
<select name="ExpireMonth">
  <option value="8">8</option>
  <option value="9">9</option>
  <option value="10">10</option>
  <option value="11">11</option>
  <option value="12">12</option>
</select>
<select name="ExpireYear">
  <option value="2017">2017</option>
  <option value="2018">2018</option>
  <option value="2019">2019</option>
  <option value="2020">2020</option>
</select>
<input type="submit" class="yyy" id="xxx" value="submit"/>
</form>

i am reading from file and again need to write back into file (i already coded)

i want remove all non hidden html fields and also 'Form' starting tag and closing tag.

thanks

Upvotes: 0

Views: 226

Answers (2)

Jay
Jay

Reputation: 9582

  1. Read the file into a string.
  2. Use gsub() to remove the unwanted tags.
  3. Write the string to the file.

#!/usr/bin/env ruby

file_name = 'payment.html.erb' 
data = IO.read(file_name)
data.gsub!(/<input.*type="hidden".*>/, '')
data.gsub!(/<form.*>/, '')
data.gsub!(/<\/form>/, '')
File.open(file_name, 'w') {|f| f.write(data) }

Update

I misread the question. This will remove all input tags that are not hidden:

#!/usr/bin/env ruby

file_name = 'payment.html.erb' 
data = IO.read(file_name)
copy = String.new(data)
copy.scan(/<input.*>/) { |tag|
    data.gsub!(tag, '') if !tag.include?("type=\"hidden\"")
}
data.gsub!(/<form.*>/, '')
data.gsub!(/<\/form>/, '')
File.open(file_name, 'w') {|f| f.write(data) }

It can be easily modified to remove other types of non-hidden tags.

Upvotes: 1

the Tin Man
the Tin Man

Reputation: 160551

You should use a parser to manipulate HTML or XML unless the content is trivial and you have complete control of it. If you don't own it, or it's not trivial, there's too many things that can go wrong if it chances, which will cause your code to break and either crash, or mess up the markup.

Instead, I'd use Nokogiri. It's an excellent parser for XML and HTML, and can make short work of what you're trying to do:

html =<<EOT
<form method="POST" action="http://www.example.com" id="my_id" class="form">
 <input type="hidden" name="Timestamp" value="2013-09-29T08:05:14.Z"/>
 <input type="hidden" name="Signature" value="dd01adafd2689b243d6cbc9088da2bf699976eb0"/>
 <input type="hidden" name="Amount" value="1"/>
<input type="text" name="AccountName" value="" placeholder="account name"/>
<p></p>
<select name="ExpireMonth">
  <option value="8">8</option>
  <option value="9">9</option>
  <option value="10">10</option>
  <option value="11">11</option>
  <option value="12">12</option>
</select>
<select name="ExpireYear">
  <option value="2017">2017</option>
  <option value="2018">2018</option>
  <option value="2019">2019</option>
  <option value="2020">2020</option>
</select>
<input type="submit" class="yyy" id="xxx" value="submit"/>
</form>
EOT

require 'nokogiri'

doc = Nokogiri::HTML::DocumentFragment.parse(html)

doc.css('input[type!="hidden"]').remove

form_contents = doc.at('form').children
doc.at('form').replace(form_contents)

puts doc.to_html

Running that outputs:

 <input type="hidden" name="Timestamp" value="2013-09-29T08:05:14.Z"><input type="hidden" name="Signature" value="dd01adafd2689b243d6cbc9088da2bf699976eb0"><input type="hidden" name="Amount" value="1"><p></p>
<select name="ExpireMonth"><option value="8">8</option>
<option value="9">9</option>
<option value="10">10</option>
<option value="11">11</option>
<option value="12">12</option></select><select name="ExpireYear"><option value="2017">2017</option>
<option value="2018">2018</option>
<option value="2019">2019</option>
<option value="2020">2020</option></select>

A parser such as Nokogiri can handle that without problems.

In addition, a parser can handle this valid markup:

<input
  type="text"
  name="AccountName"
  value=""
  placeholder="account name"
/>

Try using a regular expression and gsub to strip that or this:

<input type="text"name="AccountName"value="<your name goes here>"placeholder="account name"/>

Upvotes: 3

Related Questions