Reputation: 1689
I've just received a (MSSQL) database, which unfortunately has corrupted data in it. In some datasets German special characters (ä/ü/ö) were substitued for a � symbol (Unicode: \uFFFD).
This is not a problem with the collation or the use of various encodings or anything (I checked that already), but rather the fault of whoever provided the orignal data, as I am pretty sure that these symbols are 'hardcoded' into the data meaning that they were "always" there and probably came into existence by a failed migration (or something similar) by the data provider.
As I can't get my hands on a proper dump without these errors, I am currently looking for the possibility to select only the datasets affected. I already tried a [...] WHERE LIKE '%\uFFFD%' statement (hoping that the DB internally decodes these symbols to the unicode code), but to no avail. [...] WHERE LIKE '%�%' works partially, as it shows all affected columns, but also all datasets containing a normal ? character (which makes this method not usable for me, as the output is far too big).
Does anyone know if there is a possibility to specifically only select the datasets containing the �?
Best regards, daZza
Upvotes: 1
Views: 515
Reputation: 69789
The results will depend on your collation settings. I was able to get it to work with a few different ones. I think the most generic is this:
SELECT *
FROM (VALUES (N'A�A'), (N'A?A'), (N'AAA'), (N'�A'), (N'A�'), (N'�')) t (c)
WHERE CHARINDEX(N'�' COLLATE Latin1_General_BIN, C) > 0;
Or
SELECT *
FROM (VALUES (N'A�A'), (N'A?A'), (N'AAA'), (N'�A'), (N'A�'), (N'�')) t (c)
WHERE C LIKE N'%�%' COLLATE Latin1_General_BIN;
When using LIKE it only seems to work with collations that use a binary sort (although I have not tried all collations!). You can get these using:
SELECT name, description
FROM fn_helpcollations()
Upvotes: 1