Cylus Hodes
Cylus Hodes

Reputation: 11

Trying to match this using regular expressions in PowerShell

I am trying to use regular expressions to match certain lines in a file, but I am having some trouble.

The file contains text like this:

Mario, 123456789
Luigi, 234-567-890
Nancy, 345 5666 77533
Bowser, 348759823745908732589
Peach, 534785
Daisy, 123-456-7890

I'm trying to match just the numbers as either XXX-XXX-XXX or XXX XXX XXX pattern.

I've tried a few different ways, but it always expects something I don't want it to or it tell me everything is false.

I'm using PowerShell to do this.

At first I tried:

{$match = $i -match "\d{3}\-\d{3}\-\d{3}|\d{3}\ \d{3}\ \d{3}"
Write-Host $match}

But when I do that it matches the long strong of numbers and XXX-XXX-XXXXX.

I read something saying that n would match the exact quantity, so I tried that...

{$match = $i -match "\d{n3}\-\d{n3}\-\d{n3}|\d{n3}\ \d{n3}\ \{n3}"
Write-Host $match}

That made everything false...

So I tried

{$match = $i -match "\d\n{3}\-\d\n{3}\-\d\n{3}|\d\n{3}\ \d\n{3}\ \d\n{3}"

I also tried the lazy quantifier, ?:

{$match = $i -match "\d{3?}\-\d{3?}\-\d{3?}|\d{3?}\ \{3?}\ \{3?}"
Write-Host $match}

Still false...

The final thing I tried was this...

{$match = $i -match "\d[0-9\{3\}\-\d[0-9]\{3\}\-\d[0-9]{3\}|\d[0-9]\{3\}\ \d[0-9]\{3}\ \d[0-9]\{3\}"<br>
Write-Host $match}

Still no luck...

Upvotes: 1

Views: 11797

Answers (6)

zx38
zx38

Reputation: 114

You can also use Select-String:

Select-String '(\d{3}[ -]){2}\d{3}$' .\file.txt | % {$_.Line}

Upvotes: 0

jon Z
jon Z

Reputation: 16626

When manipulating data in PowerShell, it usually is a good idea to create objects representing the data (after all, PowerShell is all about objects). Filtering based on object properties is usually easier and more robust. Your problem is a good example. Here is what we are after:

  • the persons: $persons
  • where: where
  • the number of that person: $_.number
  • matches: -match
  • the pattern
  • starting with three digits: ^\d{3}
  • followed by three digits between dashes or spaces: (-\d{3}-|\ \d{3}\ )
  • ending on three digits: \d{3}$

Below is the entire script:

$persons = import-csv -Header "name", "number" -delimiter "," data.csv
$persons | where {$_.number -match "^\d{3}(\-\d{3}\-|\ \d{3}\ )\d{3}$"}

Upvotes: 0

Shay Levy
Shay Levy

Reputation: 126772

The following pattern gives two matches:

Get-Content .\test.txt | Where-Object {$_ -match '\d{3}[-|\s]\d{3}[-|\s]\d{3}'}

Luigi, 234-567-890
Daisy, 123-456-7890

If you want to exclude the last match, add the '$' anchor (represents the end of the string:

Get-Content .\test.txt | Where-Object {$_ -match '\d{3}[-|\s]\d{3}[-|\s]\d{3}$'}

Luigi, 234-567-890

If you want to be very specific and match lines from start to end (use the ^ anchor, denotes the start of the string):

Get-Content .\test.txt | Where-Object {$_ -match '^\w+,\s+\d{3}[-|\s]\d{3}[-|\s]\d{3}$'}

Luigi, 234-567-890

Upvotes: 1

FailedDev
FailedDev

Reputation: 26930

Try this:

/(\d+[- ])+\d+/

It's better not to have so rigid regular expressions, unless you are absolutely sure there that your input will not change.

So this regex matches at least a digit, then greedily searches for more digits followed by a space or a dash. This is also repeated as much as possible then followed by at least another digit.

Upvotes: 0

Daniel Richnak
Daniel Richnak

Reputation: 1604

As Gideon said, your first is the best place to start.

"\b\d{3}\-\d{3}\-\d{3}\b|\b\d{3}\ \d{3}\ \d{3}\b"

The \b special character added before and after each statement is a word boundary - basically a space or newline or punctuation like a period or comma. This ensures that 9999 doesn't match, but 999. does.

Upvotes: 0

Gideon Engelberth
Gideon Engelberth

Reputation: 6155

Your first answer is the closest. The {3} matches exactly 3 characters. I think the n you saw was supposed to represent any number, not an actual n character. The reason it matches the long strings is that you only specified that the match must find 3 digits, dash or space, 3 digits, dash or space, then 3 more digits. You did not specify that it doesn't count if there are more digits after that.

To not match when there is a number after, you can use a negative lookahead.

(\d{3}-\d{3}-\d{3}|\d{3}\ \d{3}\ \d{3})(?!\d)

Alternatively, if you want to only match at the end of the line, possibly with trailing space

(\d{3}-\d{3}-\d{3}|\d{3}\ \d{3}\ \d{3})\s*$

Upvotes: 0

Related Questions