physlexic
physlexic

Reputation: 868

Powershell force 5 digit format in a string pattern

So what I have now will (below) will search for XX-##-# and force it to XX-##-#0000.

How can I do this to return XX-##-0000#?

Is there a way to force 5 digits at the end, filling preceding 0s, to cover the other possibilities (XX-##-##, XX-##-###, XX-##-####)? As opposed to copying this 4 times, slightly adjusting for each.

$Pattern1 = '[a-zA-Z][a-zA-Z]-[0-9][0-9]-[0-9]'
Get-ChildItem 'C:\path\to\file\*.txt' -Recurse | ForEach {
     (Get-Content $_ | 
     ForEach  { $_ -replace $Pattern1, ('$1'+'0000')}) | 
     Set-Content $_
}

Thanks.

EDIT: I would like to do the following

Search           Replacement
XX-##-#          XX-##-0000#
XX-##-##         XX-##-000##
XX-##-###        XX-##-00###
XX-##-####       XX-##-0####

Upvotes: 1

Views: 3452

Answers (6)

js2010
js2010

Reputation: 27463

Simple example. Search numbers at the end of a string.

$text = 'aa-11-123'
$text -match '\d+$'  # sets $matches
$result = ($matches.0).padleft(5,'0')
$text -replace '\d+$', $result  # \d* won't work right

aa-11-00123

Upvotes: 0

Bohemian
Bohemian

Reputation: 425073

Use a look-ahead:

$Pattern1 = '(?<=[a-zA-Z][a-zA-Z]-\d\d-)(?=\d)(?!\d\d)'
Get-ChildItem 'C:\path\to\file\*.txt' -Recurse | ForEach {
(Get-Content $_ | 
ForEach  { $_ -replace $date_pattern1, ('000')}) | 
Set-Content $_

The look-ahead (?=\d) asserts, without consuming, that the next char is a digit.

The negative look-ahead '(!\d\d) asserts there are not 2 digits at the end, so you don't end up with XX-##-0000##.

Also note that \d (a "digit") is exactly the same as [0-9] (but easier to code.

I think you have to do the replacement 4 times.

Upvotes: 0

Kowalchick
Kowalchick

Reputation: 448

This is ancillary information that should help you fix your current code and come to the correct conclusion.

https://technet.microsoft.com/en-us/library/ee692795.aspx

I recommend utilizing a combination of the techniques listed in this documentation. The example provided is very helpful in numeric formatting:

$a = 348 
"{0:N2}" -f $a
"{0:D8}" -f $a
"{0:C2}" -f $a
"{0:P0}" -f $a
"{0:X0}" -f $a

Output
348.00
00000348
$348.00
34,800 %
15C

You can also utilize [String]::Format and add in some assertions to insure the item is formatted properly; If a specific value is not specified for example you could simply default it to 0.

https://blogs.technet.microsoft.com/heyscriptingguy/2013/03/11/understanding-powershell-and-basic-string-formatting/

Hope this helped.

Upvotes: 3

BenH
BenH

Reputation: 10044

Create capturing groups. Combine them while applying the formatting to the second.

Edit: Updated to remove the assumption that the line is only the matching string. Note, the assumption that there is only one match per line still exists.

$Pattern1 = '^(.*?)([a-zA-Z][a-zA-Z]-\d\d-)(\d+)(.*)$'
Get-ChildItem 'C:\path\to\file\*.txt' -Recurse |
    ForEach-Object {
        (Get-Content $_ | 
            ForEach-Object {
                if ($_ -match $Pattern1) {
                     "{0}{1}{2:D5}{3}" -f $matches[1],$matches[2],[int]$matches[3],$matches[4]
                } else {
                    $_
                }
            }) | Set-Content -Path $_
    }

Upvotes: 1

mklement0
mklement0

Reputation: 438143

Unfortunately, PowerShell's -replace operator doesn't support passing an expression (script block) as the replacement string, which a succinct solution would require here.

However, you can use the appropriate [regex] .NET type's .Replace() method overload:

Note: This solution focuses just on the regex-based replacement part, but it's easy to embed it into the larger pipeline from the question.

# Define sample array.
$lines = @'
Line 1 AB-00-0 and also AB-01-1
Line 2 CD-02-22 after
Line 3 EF-03-333 it
Line 4 GH-04-4444 goes 
Line 5 IJ-05-55555 on
'@ -split "`n"

# Loop over lines...
$lines | ForEach-Object {
  # ... and use a regex with 2 capture groups to capture the substrings of interest
  #     and use a script block to piece them together with number padding
  #     applied to the 2nd group
  ([regex] '\b([a-zA-Z]{2}-[0-9]{2}-)([0-9]+)').Replace($_, { 
    param($match)
    $match.Groups[1].Value + '{0:D5}' -f [int] $match.Groups[2].Value
  })
}

The above yields:

Line 1 AB-00-00000 and also AB-01-00001
Line 2 CD-02-00022 after
Line 3 EF-03-00333 it
Line 4 GH-04-04444 goes 
Line 5 IJ-05-55555 on

Upvotes: 4

JohnLBevan
JohnLBevan

Reputation: 24430

#example of the text we'd load in from file / whereever
$value = @'
this is an example of a value to be replaced: 1AB-23-45 though
we may also want to replace 0CD-87-6 or even 9ZX-00-12345 that
'@

#regex to detect the #XX-##-##### pattern you mentioned (\b word boundaries included so we don't pick up these patterns if they're somehow part of a larger string; though that seems unlikely in this case) 
$pattern = '\b(\d[A-Z][A-Z])-(\d\d)-(\d{1,5})\b'
#what we want our output to look like; with placeholders 0, 1, & 2 taking values from our captures from the above.
$format = '{0}-{1}-{2:D5}'

<#
 # #use select-string to allow us to capture every match, rather than just the first
 # $value | Select-String -Pattern $pattern -AllMatches | %{
 #     #loop through every match replacing the matched string with the reformatted version of itself
 #     $_.matches | %{
 #         #NB: we have to convert the match in group#3 to int to ensure the {2:D3} formatting from above will be applied as expected
 #         $value = $value -replace "\b$($_.value)\b", ($format -f $_.Groups[1].Value,$_.Groups[2].Value,([int]$_.Groups[3].Value))
 #     }
 # }
#>

#or this version's a little more efficient; using the matched positions to replace the strings in those positions with our new formatted value
$value | Select-String -Pattern $pattern -AllMatches | %{
    $_.matches | sort index -Descending | %{
        $value = $value.remove($_.index, $_.value.length).insert($_.index, ($format -f $_.Groups[1].Value,$_.Groups[2].Value,([int]$_.Groups[3].Value)))
    }
}


$value

Upvotes: 0

Related Questions