Reputation: 1314
I have used perl in the past and now I am switching to ruby, or rather learning ruby along side perl.
I am trying to extract data from a xml file which has data like this
<outline type="rss" text="w4kfu's bl0g" title="w4kfu's bl0g" xmlUrl="http://blog.w4kfu.com/?feed=rss" htmlUrl="http://blog.w4kfu.com"/>
I am trying to extract just the text in between the quotes of text="blahblah" and the url in htmlUrl="http://blahblahblah"
This is my attempt at solving this
ruby -ne 'next if $_ =~ %r[text=\"([^"]*)\"]x and print $1, "\n"' file_name.xml
I know that ruby tries to be as powerful as perl but at the same time having neater code. This solution seems a bit perl-ish to me and I would like to know what the proper ruby-way would be.
Upvotes: 1
Views: 82
Reputation: 37507
I recommend parsing XML with a real parser. It has the advantage of being more robust. For example, it will not false positive if another element happens to have a text
attribute, and it will accommodate whitespace and newlines in the XML.
Since you mentioned you don't have access to gems (you should work on this :), here's something using REXML from the standard library. It's not quite as clean as Nokogiri but not too bad.
require 'rexml/document'
doc = REXML::Document.new open("file.xml")
REXML::XPath.each(doc, "//outline") do |element|
puts element.attributes["title"], element.attributes["htmlUrl"]
end
Here it is as a ruby
command line to print the title:
ruby -r 'rexml/document' -e "doc = REXML::Document.new open('file_name.xml')"
-e "puts REXML::XPath.each(doc, '//outline').map{|el| el.attributes['title']}"
#=> w4kfu's bl0g
But I have a feeling that you really want a regex solution with a more rubyish feel. Here you go:
ruby -ne 'puts $_.scan(/text=\"([^"]*)\"/)' file_name.xml
#=> w4kfu's bl0g
Upvotes: 2