Jasper Kennis
Jasper Kennis

Reputation: 3062

How to regex match the value of the last appearing specific attribute in ruby

In a Ruby project I'm working with a badly formed xml file that comes from an external source. I only want one value; the last appearing record node's rate attribute. The xml looks like this (I shortened it for readability)

<?xml version="1.0" encoding="utf-16"?>
<diagram>
  <refresh value="30" />
  <margin top="30" bottom="30" left="30" right="30" />
  <rates>
    <rate value="0" />
    <rate value="100" />
    <rate value="200" />
  </rates>
  <data>
    <record rate="121" label="" />
    <record rate="124" label="" />
    <record rate="141" label="" />
    <record rate="141" label="" />
    <record rate="148" label="" />
    <record rate="269" label="6:00" />
    <record rate="701" label="" />
    <record rate="755" label="" />
    <record rate="795" label="" />
    <record rate="850" label="7:00" />
    <record rate="935" label="" />
    <record rate="977" label="" />
  </data>
</diagram>

Now all I need is the value of rate in the last record node. I'm not good at regex, but I have been toying around at Rubular and I came up with this expression:

<record\b(?:(?=(\s+(?:rate="([^"]*)")|[^\s>]+|\s+))\1)*>

Which seemed more or less sufficient; it returns the value, and an extra "/" that I can't get rid of, but if I execute this regex in my code myself I run into trouble; I don't seem to get the same results. I had this code:

regex = Regexp.new('<record\b(?:(?=(\s+(?:rate="([^"]*)")|[^\s>]+|\s+))\1)*>')
matchdata = regex.match(s)
puts matchdata[0]

I give the entire xml source to this function in the argument "s". But that only returns empty strings. Can someone help me out here?

Upvotes: 0

Views: 629

Answers (2)

the Tin Man
the Tin Man

Reputation: 160551

Just for the record, here's how to do it two different ways with a parser using the same XML and String#scan:

require 'nokogiri'
doc = Nokogiri::XML(xml)

# using XPath
doc.at('//record[last()]')['rate'] # => "977"

# using CSS
doc.css('record').last['rate'] # => "977"

# using a bit of simple Regex
xml.scan(/<record.+$/).last[/rate="(\d+)"/, 1] # => "977"

Upvotes: 2

Guilherme Bernal
Guilherme Bernal

Reputation: 8293

This matches a single record: /<record\s+rate="(\d+?)"\s+label="(.*?)"\s+\/>/. To get only the last one, use:

regex = /(?:<record\s+rate="\d+?"\s+label=".*?"\s+\/>[\s\n\r]*)*<record\s+rate="(\d+?)"\s+label="(.*?)"\s+\/>/
s.scan(regex) do |rate, label|
  ...
end

If you want only the rate, use (?:<record\s+rate="\d+?".*>[\s\n\r]*)*<record\s+rate="(\d+?)".*>.

Upvotes: 2

Related Questions