Tomer Peled
Tomer Peled

Reputation: 3591

Identify problematic characters in a string

I want to be able to identify problematic characters in a string saved in my sql server using LINQ to Entities.

Problematic characters are characters which had problem in the encoding process.

This is an example of a problematic string : "testing�stringáאç".

In the above example only the � character is considered as problematic.

So for example the following string isn't considered problematic:"testingstringáאç". How can I check this Varchar and identify that there are problematic chars in it?

Notice that my preferred solution is to identify it via a LINQ to entities query , but other solutions are also welcome - for example: some store procedure maybe?

I tried to play with Regex and with "LIKE" statement but with no success...

Upvotes: 0

Views: 395

Answers (3)

devio
devio

Reputation: 37205

.Net and NVARCHAR both use Unicode, so there is nothing inherently "problematic" (at least not for BMP characters).

So you first have to define what "problematic" in meant to mean:

  • characters are not mapped in target codepages

Simply convert between encodings and check whether data is lost:

CONVERT(NVARCHAR, CONVERT(VARCHAR, @originalNVarchar)) = @originalNVarchar

Note that you can use SQL Server collations using the COLLATE clause rather than using the default database collation.

  • characters cannot be displayed due to the fonts used

This cannot be easily done in .Net

Upvotes: 1

Sebastian Meine
Sebastian Meine

Reputation: 11773

You can do something like this:

DECLARE @StringWithProblem NVARCHAR(20) = N'This is '+NCHAR(8)+N'roblematic';
DECLARE @ProblemChars NVARCHAR(4000) = N'%['+NCHAR(0)+NCHAR(1)+NCHAR(8)+']%'; --list all problematic characters here, wrapped in %[]%

SELECT PATINDEX(@ProblemChars, @StringWithProblem), @StringWithProblem;

That gives you the index of the first problematic character or 0 if none is found.

Upvotes: 1

Matthew Watson
Matthew Watson

Reputation: 109567

Check out the Encoding class.

It has a DecoderFallback Property and a EncoderFallback Property that lets you detect and substitute bad characters found during decoding.

Upvotes: 2

Related Questions