Reputation: 85
I am trying to find a powershell command to search all files in a directory and replace any underscores with hyphens in relative links only (link can't start with http).
Here is an example:
<a href="/always_sunny/is_the_best/">
should become
<a href="/always-sunny/is-the-best/">
However, I would like the regex to ignore href values that begin with http. So a link like this should be ignored.
<a href="http://thundergunexpress/always_sunny/"
Below is the current Powershell command and regex I have been working with. This regex seems to partially work in Notepad ++ to find and replace underscores but doesn't exclude absolute links. However, the regex isn't working at all in powershell but I am not sure if this is due to the regex or my limited knowledge of Powershell. Any help with the Powershell command and the regex would be greatly appreciated.
Get-ChildItem -Path k:\toolbox\powershell\ -recurse | ForEach {If (Get-Content $_.FullName | Select-String -Pattern '(\bhref="|(?!^)\G)[^"<_]*\K_'){(Get-Content $_ | ForEach {$_ -replace '(\bhref="|(?!^)\G)[^"<_]*\K_', '-'}) | Set-Content $_}}
Upvotes: 3
Views: 822
Reputation: 627082
Note PCRE is not so similar to .NET regex when it comes to matching some multiple occurrences of a pattern in between two delimiters.
An "idiomatic" way to do that in .NET regex is to use a non-fixed width lookbehind pattern. Here, you can use
(?<=\bhref="(?!http)[^"]*?)_(?=[^"]*")
See the regex demo. Details:
(?<=\bhref="(?!http)[^"]*?)
- a positive lookbehind that matches a location that is immediately preceded with a href="
, not followed with http
, and then any zero or more chars other than "
, as few as possible_
- a _
char(?=[^"]*")
- immediately followed with zero or more chars other than "
and then a "
char.Upvotes: 1