Yong Cai
Yong Cai

Reputation: 147

Select-String of Invalid Character (For Germany Language)

I wish to catch the invalid character inside a .csv file. Currently I only able catch all the invalid characters that are not English only, is there anyway to catch all invalid characters except English & Germany?

The following code is able to filter the invalid characters that is not English letters.

$path = "product.csv"

$a = Get-Content $path | Select-String -AllMatches -Pattern "[^\x00-\x79]" | Select-Object LineNumber,Line,@{Name='String';Expression={$_.Matches.Value}}
$b = $a.count

$a
Write-Host "Total:  $b"

All Germany Characters that containing in People Name are counted as Valid Characters.

Upvotes: 0

Views: 297

Answers (1)

Manuel Batsching
Manuel Batsching

Reputation: 3616

The easiest way would be to add the hex literals for the German specific characters to your match group. The characters you are looking for are:

 ß \xdf
 Ü \xdc
 ü \xfc
 Ä \xc4
 ä \xe4
 Ö \xd6
 ö \xf6

So your new match group would be:

-Pattern "[^\x00-\x79\xdf\xdc\xfc\xc4\xe4\xd6\xf6]"

Edit:

As an alternative to matching characters by their code points you could also use the actual characters in your match pattern:

-Pattern "[^a-zA-ZäÄöÖüÜß]"

Its easier to read and also doesn't include all these non-human-readable control characters between \x00 and \x21 that you are matching above.

Upvotes: 2

Related Questions