Reputation: 23
I am trying to use Powershell to replace a semicolon ;
with a pipe |
that is in a file that is semicolon separated, so it's a specific set of semicolons that occurs between double-quotes "
. Here's a sample of the file with the specific portion in bold:
Camp;Brazil;AI;BCS GRU;;MIL-32011257;172-43333640;;"1975995;1972871;1975";FAC0088/21;3;20.000;24.8;25.000;.149;GLASSES SPARE PARTS,;EXW;C;.00;EUR;
I've tried using -replace
, as follows:
(Get-Content $file.PSPath) |
Foreach-Object { $_ -replace '".*(;).*"',"|" } |
However, the regex does not replace the semicolon between the quotes with a pipe. I've tried several other Regex to no avail. What would I do to accomplish this?
Upvotes: 2
Views: 563
Reputation: 626699
You can use a Regex.Replace
method with a callback as the replacement argument:
$s = 'Camp;Brazil;AI;BCS GRU;;MIL-32011257;172-43333640;;"1975995;1972871;1975";FAC0088/21;3;20.000;24.8;25.000;.149;GLASSES SPARE PARTS,;EXW;C;.00;EUR;'
$rx = [regex]'"[^"]*"'
$rx.Replace($s, { param($m) $m.value.Replace(';','|') })
# => Camp;Brazil;AI;BCS GRU;;MIL-32011257;172-43333640;;"1975995|1972871|1975";FAC0088/21;3;20.000;24.8;25.000;.149;GLASSES SPARE PARTS,;EXW;C;.00;EUR;
That is, match any substring between two "
chars, and replace all ;
chars with |
inside the matches only.
Also, here is PowerShell Core v6.1+ version where you can pass a script block as the -replace
replacement operand where the match is represented as an automatic $_
variable:
(Get-Content $file.PSPath) |
Foreach-Object { $_ -replace '"[^"]*"', { $_.Value.Replace(';', '|') } }
Why not use lookarounds?
Since the left- and right-hand delimiters are identical single chars, "
, any lookaround-based solution will either be erroneous or too long and still prone to errors. It would happen because lookarounds do not consume the texts they match, and each "
thus could be matched separately as the initial "
. Have a look at the (?<="[^"]*);(?=[^"]*")
regex, where "b;c;d";1;23;"45;677777;z"
turns into "b|c|d"|1|23|"45|677777|z"
because the ;
between 1
and 23
, and 23
and "
are found between two double quotation marks.
Similar problem is also with the \G
-based patterns that can be used to match multiple match occurrences between two different delimiters, and that are usually not used in .NET regex as the latter supports infinite-width lookbehinds.
Upvotes: 4