wtjones
wtjones

Reputation: 4160

Powershell: Leave item alone if regex doesn't match

I have a list of pdf files (from daily processing), some with date stamps of various formatting, some without.

Example:

$f = @("testLtr06-09-02.pdf", "otherletter.pdf","WelcomeLtr043009.pdf")

I am trying to remove the datestamp by stripping out dashes, then replacing any consecutive group of numbers (4 or more, I may change this to 6) with the string "DATESTAMP".

So far I have this:

$d =  $f | foreach {$_ -replace "-", ""} | foreach { $_ -replace ([regex]::Matches($_ , "\d{4,}")), "DATESTAMP"}
echo $d

The output:

testLtrDATESTAMP.pdf
DATESTAMPoDATESTAMPtDATESTAMPhDATESTAMPeDATESTAMPrDATESTAMPlDATESTAMPeDATESTAMPtDATESTAMPtDATESTAMPeDATESTAMPrDATESTAMP.DATESTAMPpDATESTAMPdDATESTAMPfDATESTAMP
WelcomeLtrDATESTAMP.pdf

It works fine if the file has a datestamp but it seems to be freaking out the -replace and inserting DATESTAMP after every character. Is there a way to fix this? I tried to change it to a foreach loop but I couldn't figure out how to get true/false from regex.

Thanks in advance.

Upvotes: 1

Views: 4309

Answers (2)

jitter
jitter

Reputation: 54605

$_ -replace ([regex]::Matches($_ , "\d{4,}")), "DATESTAMP"

Means in $_ replace every finding of ([regex]::Matches($_ , "\d{4,}")) with "DATESTAMP".

As in a filename with no timestamp (or at least 4 consecutive numbers) there is no match, it returns "" (an empty string).

Thus every empty string gets replaced with DATESTAMP. And such a empty string "" sits at the start of the string and after every other character.

Thats why you get this long string with every character surrounded by DATESTAMP.


To check if there even exists a \d{4,} in your string you should able to use

[regex]::IsMatch($_, "\d{4,}")

I'm no Powershell user but this line alone should do the job. But I'm not sure about being able to use the if in a pipeline and wether or not the assignment and the echo $d are needed

$f | foreach-object {$_ -replace "-", ""} | foreach-object {if ($_ -match "\d{4,}") { $_ -replace "\d{4,}", "DATESTAMP"} else { $_ }}

Upvotes: 2

Shay Levy
Shay Levy

Reputation: 126732

You can simply do:

PS > $f -replace "(\d{2}-){2}\d{2}|\d{4,}","DATESTAMP"
testLtrDATESTAMP.pdf
otherletter.pdf
WelcomeLtrDATESTAMP.pdf

Upvotes: 4

Related Questions