Vid Man
Vid Man

Reputation: 67

Replacing any content inbetween second and third underscore

I have a PowerShell Scriptline that replaces(deletes) characters between the second and third underscore with an "_":

get-childitem *.pdf | rename-item -newname { $_.name -replace '_\p{L}+, \p{L}+_', "_"}

Examples:

12345_00001_LastName, FirstName_09_2018_Text_MoreText.pdf
12345_00002_LastName, FirstName-SecondName_09_2018_Text_MoreText.pdf
12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf

This _\p{L}+, \p{L}+_ regex only works for the first example. To replace everything inbetween I have used _(?:[^_]*)_([^_]*)_ (according to regex101 this should almost work) but the output is:

12345_09_MoreText.pdf

The desired output would be:

 12345_00001_09_2018_Text_MoreText.pdf
 12345_00002_09_2018_Text_MoreText.pdf
 12345_00003_09_2018_Text_MoreText.pdf

How do I correctly replace the second and third underscore and everything inbetween with an "_"?

Upvotes: 2

Views: 415

Answers (4)

Lee_Dailey
Lee_Dailey

Reputation: 7489

here's one other way ... using string methods.

'12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf'.
    Split('_').
    Where({
        $_ -notmatch ','
        }) -join '_'

result = 12345_00003_09_2018_Text_MoreText.pdf

that does the following ...

  • split on the underscores
  • toss out any item that has a comma in it
  • join the remaining items back into a string with underscores

i suspect that the pure regex solution will be faster, but you may want to use this simply to have something that is easier to understand when you next need to modify it. [grin]

Upvotes: 2

mklement0
mklement0

Reputation: 439277

To offer an alternative solution that avoids a complex regex: The following is based on the -split and -join operators and shows PowerShell's flexibility with respect to array slicing:

Get-ChildItem *.pdf | Rename-Item { ($_.Name -split '_')[0..1 + 3..6] -join '_' } -WhatIf
  • $_.Name -split '_' splits the filename by _ into an array of tokens (substrings).
  • Array slice [0..1 + 3..6] combines two range expressions (..) to essentially remove the token with index 2 from the array.
  • -join '_' reassembles the modified array into a _-separated string, yielding the desired result.

Note: 6, the upper array bound, is hard-coded above, which is suboptimal, but sufficient with input as predictable as in this case.

As of Windows PowerShell v5.1 / PowerShell Core 6.1.0, in order to determine the upper bound dynamically, you require the help of an auxiliary variable, which is clumsy:

Get-ChildItem *.pdf |
  Rename-Item { ($arr = $_.Name -split '_')[0..1 + 3..($arr.Count-1)] -join '_' } -WhatIf

Wouldn't it be nice if we could write [0..1 + 3..] instead? This and other improvements to PowerShell's slicing syntax are the subject of this feature suggestion on GitHub.

Upvotes: 2

Vivek Kumar Singh
Vivek Kumar Singh

Reputation: 3350

If you don't want to use regex -

$files = get-childitem *.pdf        #get all pdf files
$ModifiedFiles, $New = @()  #declaring two arrays
foreach($file in $files)
{
    $ModifiedFiles = $file.split("_")
    $ModifiedFiles = $ModifiedFiles | Where-Object { $_ -ne $ModifiedFiles[2] }     #ommitting anything between second and third underscore
    $New = "$ModifiedFiles" -replace (" ", "_")
    Rename-Item -Path $file.FullName -NewName $New
}

Sample Data -

$files = "12345_00001_LastName, FirstName_09_2018_Text_MoreText.pdf", "12345_00002_LastName, FirstName-SecondName_09_2018_Text_MoreText.pdf", "12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf"
$ModifiedFiles, $New = @()  #declaring two arrays
foreach($file in $files)
{
    $ModifiedFiles = $file.split("_")
    $ModifiedFiles = $ModifiedFiles | Where-Object { $_ -ne $ModifiedFiles[2] }     #ommitting anything between second and third underscore
    $New = "$ModifiedFiles" -replace (" ", "_")
}

Upvotes: 3

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627083

You may use

-replace '^((?:[^_]*_){2})[^_]+_', '$1'

See the regex demo

Details

  • ^ - start of the line
  • ((?:[^_]*_){2}) - Group 1 (the value will be referenced to with $1 from the replacement pattern): two repetitions of
    • [^_]* - 0+ chars other than an underscore
    • _ - an underscore
  • [^_]+ - 1 or more chars other than _
    • _ - an underscore

Upvotes: 2

Related Questions