Reputation: 3062
In a Ruby project I'm working with a badly formed xml file that comes from an external source. I only want one value; the last appearing record node's rate attribute. The xml looks like this (I shortened it for readability)
<?xml version="1.0" encoding="utf-16"?>
<diagram>
<refresh value="30" />
<margin top="30" bottom="30" left="30" right="30" />
<rates>
<rate value="0" />
<rate value="100" />
<rate value="200" />
</rates>
<data>
<record rate="121" label="" />
<record rate="124" label="" />
<record rate="141" label="" />
<record rate="141" label="" />
<record rate="148" label="" />
<record rate="269" label="6:00" />
<record rate="701" label="" />
<record rate="755" label="" />
<record rate="795" label="" />
<record rate="850" label="7:00" />
<record rate="935" label="" />
<record rate="977" label="" />
</data>
</diagram>
Now all I need is the value of rate in the last record node. I'm not good at regex, but I have been toying around at Rubular and I came up with this expression:
<record\b(?:(?=(\s+(?:rate="([^"]*)")|[^\s>]+|\s+))\1)*>
Which seemed more or less sufficient; it returns the value, and an extra "/" that I can't get rid of, but if I execute this regex in my code myself I run into trouble; I don't seem to get the same results. I had this code:
regex = Regexp.new('<record\b(?:(?=(\s+(?:rate="([^"]*)")|[^\s>]+|\s+))\1)*>')
matchdata = regex.match(s)
puts matchdata[0]
I give the entire xml source to this function in the argument "s". But that only returns empty strings. Can someone help me out here?
Upvotes: 0
Views: 629
Reputation: 160551
Just for the record, here's how to do it two different ways with a parser using the same XML and String#scan:
require 'nokogiri'
doc = Nokogiri::XML(xml)
# using XPath
doc.at('//record[last()]')['rate'] # => "977"
# using CSS
doc.css('record').last['rate'] # => "977"
# using a bit of simple Regex
xml.scan(/<record.+$/).last[/rate="(\d+)"/, 1] # => "977"
Upvotes: 2
Reputation: 8293
This matches a single record: /<record\s+rate="(\d+?)"\s+label="(.*?)"\s+\/>/
. To get only the last one, use:
regex = /(?:<record\s+rate="\d+?"\s+label=".*?"\s+\/>[\s\n\r]*)*<record\s+rate="(\d+?)"\s+label="(.*?)"\s+\/>/
s.scan(regex) do |rate, label|
...
end
If you want only the rate, use (?:<record\s+rate="\d+?".*>[\s\n\r]*)*<record\s+rate="(\d+?)".*>
.
Upvotes: 2