jScott
jScott

Reputation: 37

How to conditionally check and extract XML elements

I have to parse a given XML File that looks like this:

<country id='cid-cia-Ashmore-and-Cartier-Islands' 
  continent='Asia'
  name='Ashmore and Cartier Islands'
  datacode='AT'
  total_area='5'
  government='territory of Australia administered by the Australian Ministry for the Environment'>
  <coasts>Indian Ocean</coasts>
</country>

<country id='cid-cia-Azerbaijan' 
  continent='Asia'
  name='Azerbaijan'
  datacode='AJ'
  total_area='86600'
  population='7676953'
  population_growth='0.78'
  infant_mortality='74.5'
  inflation='85'
  gdp_total='11500'
  indep_date='30 08 1991'
  government='republic'
  capital='Baku'>
  <ethnicgroups name='Russian'>2.5</ethnicgroups>
  <ethnicgroups name='Armenian'>2.3</ethnicgroups>
  <ethnicgroups name='Azeri'>90</ethnicgroups>
  <ethnicgroups name='Dagestani Peoples'>3.2</ethnicgroups>
  <religions name='Muslim'>93.4</religions>
  <religions name='Armenian Orthodox'>2.3</religions>
  <religions name='Russian Orthodox'>2.5</religions>
  <languages name='Russian'>3</languages>
  <languages name='Armenian'>2</languages>
  <languages name='Azeri'>89</languages>
  <borders country='cid-cia-Armenia'>787</borders>
  <borders country='cid-cia-Georgia'>322</borders>
  <borders country='cid-cia-Iran'>611</borders>
  <borders country='cid-cia-Russia'>284</borders>
  <borders country='cid-cia-Turkey'>9</borders>
  <coasts>Caspian Sea</coasts>
</country>

<country id='cid-cia-Bahrain' 
  continent='Asia'
  name='Bahrain'
  datacode='BA'
  total_area='620'
  population='590042'
  population_growth='2.27'
  infant_mortality='17.1'
  inflation='3'
  gdp_total='7300'
  indep_date='15 08 1971'
  government='traditional monarchy'
  capital='Manama'>
  <ethnicgroups name='Arab'>10</ethnicgroups>
  <ethnicgroups name='Asian'>13</ethnicgroups>
  <ethnicgroups name='Bahraini'>63</ethnicgroups>
  <ethnicgroups name='Iranian'>8</ethnicgroups>
  <religions name='Sunni Muslim'>25</religions>
  <religions name='Shia Muslim'>75</religions>
  <coasts>Persian Gulf</coasts>
</country>

I have to parse this with XML to grab the name and inflation value ONLY if there is an inflation value associated with a given Country.

I have this Rubular setup here: http://rubular.com/r/L7pbX2mm1J with my progress. I have it returning back two matches which is fine, but if you look closely at the 1st match, the country is Ashmore and Cartier Islands and then look at the XML for that Country and there is no inflation - the regex just keeps going down until it finds an inflation value, then it closes it.

I'm wondering if there is a way I can have some sort of conditional operation that checks if there is an inflation key at all, and if so, grab the name value and inflation value...

Thanks in advance!

Upvotes: 0

Views: 130

Answers (3)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89566

You can indeed use Nokogiri, an example:

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::XML(open('./country.xml'))
doc.xpath('//country[@inflation]/@name|//country/@inflation').each do |res|
puts res
end

if you "need" to use a regex, this one should do the job:

<country [^>]*? name='(?<name>[^']+)'[^>]*? inflation='(?<inflation>[^']+)' 

Upvotes: 2

Andy
Andy

Reputation: 335

The Ruby standard library includes the XML parser REXML.

Upvotes: 1

DMKE
DMKE

Reputation: 4603

Don't use regular expressions for XML. Instead, use an engine like Nokogiri.

Upvotes: 2

Related Questions