Reputation: 4099
I have xml files formatted like this:
<User>
<FirstName>Foo Bar</FirstName>
<CompanyName>Foo</CompanyName>
<EmailAddress>[email protected]</EmailAddress>
</User>
<User>
...
I want to read through all xml files, creating as output <CompanyName>,<EmailAddress>
, so:
Foo,[email protected]
User2,[email protected]
Blah,[email protected]
I am using the following snippet:
$directory = "\\PC001\Blah"
Function GetFiles ($path) {
foreach ($item in Get-ChildItem $path) {
if ( Test-Path $item.FullName -PathType Container) {
GetFiles ($item.FullName)
} else {
$item
}
}
}
Foreach ($file in GetFiles($directory)) {
If ($file.extension -eq '.test') {
$content = Get-Content $file.fullname
$pattern = '(?si)<CompanyName>(.*?)</CompanyName>\n<EmailAddress>(.*?)</EmailAddress>'
$matches = [regex]::matches($content, $pattern)
foreach ($match in $matches) {
$matches[0].Value -replace "<.*?>"
}
}
}
However, $matches
is empty so there's something wrong with my regex. If I leave out \n<EmailAddress>(.*?)</EmailAddress>
, it works. What am I doing wrong?
Upvotes: 0
Views: 43
Reputation: 67978
$pattern = '(?si)<CompanyName>(.*?)</CompanyName>\s*<EmailAddress>(.*?)</EmailAddress>'
Try this.\s
will make sure all spaces and newlines are covered.
Upvotes: 2
Reputation: 174776
There is a chance of \r
character would present in that file. So change your regex like below,
$pattern = '(?si)<CompanyName>(.*?)</CompanyName>[\n\r]+<EmailAddress>(.*?)</EmailAddress>'
OR
$pattern = '(?si)<CompanyName>(.*?)</CompanyName>.*?<EmailAddress>(.*?)</EmailAddress>'
Upvotes: 1