Reputation: 147
I wish to catch the invalid character inside a .csv file. Currently I only able catch all the invalid characters that are not English only, is there anyway to catch all invalid characters except English & Germany?
The following code is able to filter the invalid characters that is not English letters.
$path = "product.csv"
$a = Get-Content $path | Select-String -AllMatches -Pattern "[^\x00-\x79]" | Select-Object LineNumber,Line,@{Name='String';Expression={$_.Matches.Value}}
$b = $a.count
$a
Write-Host "Total: $b"
All Germany Characters that containing in People Name are counted as Valid Characters.
Upvotes: 0
Views: 297
Reputation: 3616
The easiest way would be to add the hex literals for the German specific characters to your match group. The characters you are looking for are:
ß \xdf
Ü \xdc
ü \xfc
Ä \xc4
ä \xe4
Ö \xd6
ö \xf6
So your new match group would be:
-Pattern "[^\x00-\x79\xdf\xdc\xfc\xc4\xe4\xd6\xf6]"
Edit:
As an alternative to matching characters by their code points you could also use the actual characters in your match pattern:
-Pattern "[^a-zA-ZäÄöÖüÜß]"
Its easier to read and also doesn't include all these non-human-readable control characters between \x00
and \x21
that you are matching above.
Upvotes: 2