Reputation: 3591
I want to be able to identify problematic characters in a string saved in my sql server using LINQ to Entities.
Problematic characters are characters which had problem in the encoding process.
This is an example of a problematic string : "testing�stringáאç".
In the above example only the � character is considered as problematic.
So for example the following string isn't considered problematic:"testingstringáאç". How can I check this Varchar and identify that there are problematic chars in it?
Notice that my preferred solution is to identify it via a LINQ to entities query , but other solutions are also welcome - for example: some store procedure maybe?
I tried to play with Regex and with "LIKE" statement but with no success...
Upvotes: 0
Views: 395
Reputation: 37205
.Net and NVARCHAR both use Unicode, so there is nothing inherently "problematic" (at least not for BMP characters).
So you first have to define what "problematic" in meant to mean:
Simply convert between encodings and check whether data is lost:
CONVERT(NVARCHAR, CONVERT(VARCHAR, @originalNVarchar)) = @originalNVarchar
Note that you can use SQL Server collations using the COLLATE clause rather than using the default database collation.
This cannot be easily done in .Net
Upvotes: 1
Reputation: 11773
You can do something like this:
DECLARE @StringWithProblem NVARCHAR(20) = N'This is '+NCHAR(8)+N'roblematic';
DECLARE @ProblemChars NVARCHAR(4000) = N'%['+NCHAR(0)+NCHAR(1)+NCHAR(8)+']%'; --list all problematic characters here, wrapped in %[]%
SELECT PATINDEX(@ProblemChars, @StringWithProblem), @StringWithProblem;
That gives you the index of the first problematic character or 0 if none is found.
Upvotes: 1
Reputation: 109567
Check out the Encoding class.
It has a DecoderFallback Property and a EncoderFallback Property that lets you detect and substitute bad characters found during decoding.
Upvotes: 2