Naga
Naga

Reputation: 367

Regex to extract specific XML tags and name them in XML file

I wanted to extract values and name the variables of two XML tags (FCF and DKY tags from the below sample). I was able to extract and name the values separately in two separate regex, but I am unable to read them together in one regex.

I am failing to get the output in one regex expression.

Sample XML:

 <FCF>100</FCF>
 <REF>AAAAAAAAAAAA</REF>
 <DNA></DNA>
 <DDB>1968-08-01 00:00:00</DDB>
 <DKY>00003</DKY>
 <DNT>ARE</DNT>
 <DAD>1170</DAD>
 <DCI>AE</DCI>

Current Powershell Script with regex:

switch -Regex -File $inputxml
{

'<FCF>(?<key1>[-]?\d+)</FCF>'
    {
        $currentFCF = $matches.Key1
        continue 
    }   
    
    
'<DKY>(?<key2>.*)</DKY>'
    {
        $currentDKY = $matches.Key2
        continue 
    }   
    
}

I am expecting to have something like this, but the below regex doesn't work

'<FCF>(?<Key1>[-]?\d+)</FCF>[\s\S]*<DKY>(?<Key2>.*)</DKY>'
    {
        $currentFCF = $matches.Key1
        $currentDKY = $matches.Key2
        continue 
    }

Upvotes: 1

Views: 558

Answers (1)

Santiago Squarzon
Santiago Squarzon

Reputation: 60518

I think your switch is perfectly fine and, if you wanted to match 2 different lines with one regex pattern, as far as I can tell, it would require to load all the file in memory and I don't think that's a route you want to take considering it's size is 60Gb+. You could add a new condition to your switch statement where it would break the loop if both variables have been populated so you don't need to keep looping until EOF:

switch -Regex -File $inputxml {

    { $currentFCF -and $currentDKY } { break }

    '<FCF>(?<key1>[-]?\d+)</FCF>' {
        $currentFCF = $matches.Key1
        continue
    }

    '<DKY>(?<key2>.*)</DKY>' {
        $currentDKY = $matches.Key2
        continue
    }
}

Upvotes: 2

Related Questions