Pr0no
Pr0no

Reputation: 4099

How van I get 2 matches in a regex?

I have xml files formatted like this:

<User>
<FirstName>Foo Bar</FirstName>
<CompanyName>Foo</CompanyName>
<EmailAddress>[email protected]</EmailAddress>
</User>
<User>
...

I want to read through all xml files, creating as output <CompanyName>,<EmailAddress>, so:

Foo,[email protected]
User2,[email protected]
Blah,[email protected]

I am using the following snippet:

$directory = "\\PC001\Blah"


Function GetFiles ($path) {
    foreach ($item in Get-ChildItem $path) {
        if ( Test-Path $item.FullName -PathType Container) {
            GetFiles ($item.FullName)
        } else {
            $item
        }
    }
}


Foreach ($file in GetFiles($directory)) {
    If ($file.extension -eq '.test') {
        $content = Get-Content $file.fullname
        $pattern = '(?si)<CompanyName>(.*?)</CompanyName>\n<EmailAddress>(.*?)</EmailAddress>'
        $matches = [regex]::matches($content, $pattern)

        foreach ($match in $matches) {
            $matches[0].Value -replace "<.*?>" 
        }    
    }
}

However, $matches is empty so there's something wrong with my regex. If I leave out \n<EmailAddress>(.*?)</EmailAddress>, it works. What am I doing wrong?

Upvotes: 0

Views: 43

Answers (2)

vks
vks

Reputation: 67978

$pattern = '(?si)<CompanyName>(.*?)</CompanyName>\s*<EmailAddress>(.*?)</EmailAddress>'

Try this.\s will make sure all spaces and newlines are covered.

Upvotes: 2

Avinash Raj
Avinash Raj

Reputation: 174776

There is a chance of \r character would present in that file. So change your regex like below,

$pattern = '(?si)<CompanyName>(.*?)</CompanyName>[\n\r]+<EmailAddress>(.*?)</EmailAddress>'

OR

$pattern = '(?si)<CompanyName>(.*?)</CompanyName>.*?<EmailAddress>(.*?)</EmailAddress>'

Upvotes: 1

Related Questions