Reputation: 81
Trying to extract some strings from a file. Here's a simplified example of the text in the file:
<modelName>thing1</modelName><gtin>123456789</gtin><description>blah blah blah</description>
<modelName>thing2</modelName><gtin>789456123</gtin><description>blah blah blah</description>
<modelName>thing3</modelName><gtin>456789123</gtin><description>blah blah blah</description>
I want to extract just this part of each line: <gtin>xxxxxxx</gtin>
and put them into another file.
I do not want the whole line, just the gtin.
Here's what I tried:
Get-Content -Path C:\firstFile.xml -Readcount 1000 | foreach { $_ -match "<gtin1>*</gtin1>" } | out-file C:\gtins.txt
But as you can probably guess it's not working.
Any help is greatly appreciated. I have a feeling this is embarrassingly easy.
Thanks!
Upvotes: 1
Views: 9898
Reputation: 24525
(Edit: Ansgar Wiechers is right that you shouldn't parse XML using a regular expression, and that proper XML parsing is vastly to be preferred.)
You can extract substrings using Select-String
and a regular expression. Example:
Get-Content "C:\firstfile.xml" | Select-String '(<gtin>.+</gtin>)' | ForEach-Object {
$_.Matches[0].Groups[1].Value
}
If you want just the value between the tags, move the (
and )
to surround only the .+
portion of the expression.
More information about regular expressions:
PS C:\> help about_Regular_Expressions
Upvotes: 2
Reputation: 200213
Use an actual XML parser for extracting data from XML files.
[xml]$xml = Get-Content 'C:\firstfile.xml'
$xml.SelectNodes('//gtin') | Select-Object -Expand '#text'
Upvotes: 0