Brian
Brian

Reputation: 85

Powershell Regex to Replace Underscores with Hyphens

I am trying to find a powershell command to search all files in a directory and replace any underscores with hyphens in relative links only (link can't start with http).

Here is an example:

<a href="/always_sunny/is_the_best/">

should become

<a href="/always-sunny/is-the-best/">

However, I would like the regex to ignore href values that begin with http. So a link like this should be ignored.

<a href="http://thundergunexpress/always_sunny/"

Below is the current Powershell command and regex I have been working with. This regex seems to partially work in Notepad ++ to find and replace underscores but doesn't exclude absolute links. However, the regex isn't working at all in powershell but I am not sure if this is due to the regex or my limited knowledge of Powershell. Any help with the Powershell command and the regex would be greatly appreciated.

Get-ChildItem -Path k:\toolbox\powershell\ -recurse | ForEach {If (Get-Content $_.FullName | Select-String -Pattern '(\bhref="|(?!^)\G)[^"<_]*\K_'){(Get-Content $_ | ForEach {$_ -replace '(\bhref="|(?!^)\G)[^"<_]*\K_', '-'}) | Set-Content $_}}

Upvotes: 3

Views: 822

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

Note PCRE is not so similar to .NET regex when it comes to matching some multiple occurrences of a pattern in between two delimiters.

An "idiomatic" way to do that in .NET regex is to use a non-fixed width lookbehind pattern. Here, you can use

(?<=\bhref="(?!http)[^"]*?)_(?=[^"]*")

See the regex demo. Details:

  • (?<=\bhref="(?!http)[^"]*?) - a positive lookbehind that matches a location that is immediately preceded with a href=", not followed with http, and then any zero or more chars other than ", as few as possible
  • _ - a _ char
  • (?=[^"]*") - immediately followed with zero or more chars other than " and then a " char.

Upvotes: 1

Related Questions