user2840647
user2840647

Reputation: 1314

Ruby Style Regular Expressions

I have used perl in the past and now I am switching to ruby, or rather learning ruby along side perl.

I am trying to extract data from a xml file which has data like this

        <outline type="rss" text="w4kfu's bl0g" title="w4kfu's bl0g" xmlUrl="http://blog.w4kfu.com/?feed=rss" htmlUrl="http://blog.w4kfu.com"/>

I am trying to extract just the text in between the quotes of text="blahblah" and the url in htmlUrl="http://blahblahblah"

This is my attempt at solving this

ruby -ne 'next if $_ =~ %r[text=\"([^"]*)\"]x and print $1, "\n"' file_name.xml

I know that ruby tries to be as powerful as perl but at the same time having neater code. This solution seems a bit perl-ish to me and I would like to know what the proper ruby-way would be.

Upvotes: 1

Views: 82

Answers (1)

Mark Thomas
Mark Thomas

Reputation: 37507

I recommend parsing XML with a real parser. It has the advantage of being more robust. For example, it will not false positive if another element happens to have a text attribute, and it will accommodate whitespace and newlines in the XML.

Since you mentioned you don't have access to gems (you should work on this :), here's something using REXML from the standard library. It's not quite as clean as Nokogiri but not too bad.

require 'rexml/document'

doc = REXML::Document.new open("file.xml")
REXML::XPath.each(doc, "//outline") do |element|
  puts element.attributes["title"], element.attributes["htmlUrl"]
end

Here it is as a ruby command line to print the title:

ruby -r 'rexml/document' -e "doc = REXML::Document.new open('file_name.xml')" 
  -e "puts REXML::XPath.each(doc, '//outline').map{|el| el.attributes['title']}"

#=> w4kfu's bl0g

But I have a feeling that you really want a regex solution with a more rubyish feel. Here you go:

ruby -ne 'puts $_.scan(/text=\"([^"]*)\"/)' file_name.xml

#=> w4kfu's bl0g

Upvotes: 2

Related Questions