Brian
Brian

Reputation: 81

Powershell extract string from XML file

Trying to extract some strings from a file. Here's a simplified example of the text in the file:

<modelName>thing1</modelName><gtin>123456789</gtin><description>blah blah blah</description>
<modelName>thing2</modelName><gtin>789456123</gtin><description>blah blah blah</description>
<modelName>thing3</modelName><gtin>456789123</gtin><description>blah blah blah</description>

I want to extract just this part of each line: <gtin>xxxxxxx</gtin> and put them into another file.

I do not want the whole line, just the gtin.

Here's what I tried:

Get-Content -Path C:\firstFile.xml -Readcount 1000 | foreach { $_ -match "<gtin1>*</gtin1>" } | out-file C:\gtins.txt

But as you can probably guess it's not working.

Any help is greatly appreciated. I have a feeling this is embarrassingly easy.

Thanks!

Upvotes: 1

Views: 9898

Answers (2)

Bill_Stewart
Bill_Stewart

Reputation: 24525

(Edit: Ansgar Wiechers is right that you shouldn't parse XML using a regular expression, and that proper XML parsing is vastly to be preferred.)

You can extract substrings using Select-String and a regular expression. Example:

Get-Content "C:\firstfile.xml" | Select-String '(<gtin>.+</gtin>)' | ForEach-Object {
  $_.Matches[0].Groups[1].Value
}

If you want just the value between the tags, move the ( and ) to surround only the .+ portion of the expression.

More information about regular expressions:

PS C:\> help about_Regular_Expressions

Upvotes: 2

Ansgar Wiechers
Ansgar Wiechers

Reputation: 200213

Do not parse XML with regular expressions.

Use an actual XML parser for extracting data from XML files.

[xml]$xml = Get-Content 'C:\firstfile.xml'
$xml.SelectNodes('//gtin') | Select-Object -Expand '#text'

Upvotes: 0

Related Questions