SchenkerPaul
SchenkerPaul

Reputation: 23

Powershell regex to replace a specific character between two identical characters

I am trying to use Powershell to replace a semicolon ; with a pipe | that is in a file that is semicolon separated, so it's a specific set of semicolons that occurs between double-quotes ". Here's a sample of the file with the specific portion in bold:

Camp;Brazil;AI;BCS GRU;;MIL-32011257;172-43333640;;"1975995;1972871;1975";FAC0088/21;3;20.000;24.8;25.000;.149;GLASSES SPARE PARTS,;EXW;C;.00;EUR;

I've tried using -replace, as follows:

(Get-Content $file.PSPath) |
    Foreach-Object { $_ -replace '".*(;).*"',"|" } |

However, the regex does not replace the semicolon between the quotes with a pipe. I've tried several other Regex to no avail. What would I do to accomplish this?

Upvotes: 2

Views: 563

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626699

You can use a Regex.Replace method with a callback as the replacement argument:

$s = 'Camp;Brazil;AI;BCS GRU;;MIL-32011257;172-43333640;;"1975995;1972871;1975";FAC0088/21;3;20.000;24.8;25.000;.149;GLASSES SPARE PARTS,;EXW;C;.00;EUR;'
$rx = [regex]'"[^"]*"'
$rx.Replace($s, { param($m) $m.value.Replace(';','|') })
# => Camp;Brazil;AI;BCS GRU;;MIL-32011257;172-43333640;;"1975995|1972871|1975";FAC0088/21;3;20.000;24.8;25.000;.149;GLASSES SPARE PARTS,;EXW;C;.00;EUR;

That is, match any substring between two " chars, and replace all ; chars with | inside the matches only.

Also, here is PowerShell Core v6.1+ version where you can pass a script block as the -replace replacement operand where the match is represented as an automatic $_ variable:

(Get-Content $file.PSPath) |
    Foreach-Object { $_ -replace '"[^"]*"', { $_.Value.Replace(';', '|') } }

Why not use lookarounds?

Since the left- and right-hand delimiters are identical single chars, ", any lookaround-based solution will either be erroneous or too long and still prone to errors. It would happen because lookarounds do not consume the texts they match, and each " thus could be matched separately as the initial ". Have a look at the (?<="[^"]*);(?=[^"]*") regex, where "b;c;d";1;23;"45;677777;z" turns into "b|c|d"|1|23|"45|677777|z" because the ; between 1 and 23, and 23 and " are found between two double quotation marks.

Similar problem is also with the \G-based patterns that can be used to match multiple match occurrences between two different delimiters, and that are usually not used in .NET regex as the latter supports infinite-width lookbehinds.

Upvotes: 4

Related Questions