shivadarshan
shivadarshan

Reputation: 946

how to select rows that contains non-english characters in sql server 2005(it should filter only non-english chars, not special characters)

As my table contains non-English(contains characters in different languages) characters and special characters in a column. I need filter only non-English characters. It should filter any special characters.

i tried using different methods to filter but failed to filter few rows. someone please help me on this. Thanks in advance.

ex: column name LOCATION contains following rows :

row 1: துய இம்மானுவேல் தேவாலயம், North Street, Idyanvillai, Tamil Nadu, India

row 2:Dr.Hakim M.Asgar Ali's ROY MEDICAL CENTRE™ Unani Clinic In Kerala India, Thycaud Hospital Road, Opp. Amritha Hotel,, Thycaud.P.O.,, Thiruvananthapuram, Kerala, India

row 3: ಕಾಳಿಕಾಂಬ ದೇವಿ ದೇವಸ್ಥಾನ, Shivaji Nagar, Davangere, Karnataka, India

As the above contains characters in many language. can any one help me to select only row 2 thanks.

Upvotes: 8

Views: 50711

Answers (4)

Emre
Emre

Reputation: 23

I used this for converting emails which has non-english characters to NULL. (MS SQL)

CREATE FUNCTION ufn_character_test(@kontrol nvarchar(MAX))
RETURNS int
AS
BEGIN
DECLARE @intFlag INT
SET @intFlag = 1
DECLARE @intFlag2 INT
SET @intFlag2 = 1
DECLARE @SonucFlag INT
SET @SonucFlag=0
DECLARE @SonucFlag2 INT
SET @SonucFlag2=0
WHILE (@intFlag <=LEN(@kontrol))
BEGIN
WHILE (@intFlag2 <=62)
BEGIN
IF CAST(SUBSTRING(@kontrol, @intFlag, 1) as varbinary(2))
=CAST(SUBSTRING('0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', @intFlag2, 1) as varbinary(2))
SET @SonucFlag=@SonucFlag+1
SET @intFlag2 = @intFlag2 + 1
END
SET @intFlag = @intFlag + 1
SET @intFlag2 = 1
END
IF @SonucFlag=LEN(@kontrol)
SET @SonucFlag2=1
ELSE
SET @SonucFlag2=0
RETURN @SonucFlag2
END


USE master;
GRANT EXEC ON dbo.ufn_character_test TO PUBLIC


SELECT ADR_EMAIL,
CASE WHEN ADR_EMAIL NOT LIKE '%[%]%'
AND ADR_EMAIL NOT LIKE '%@%@%%'
AND ADR_EMAIL NOT LIKE '.%'
AND ADR_EMAIL NOT LIKE '%.'      
AND dbo.ufn_character_test(REPLACE(REPLACE(REPLACE(REPLACE(ADR_EMAIL,'.',''),'@',''),'_',''),'-',''))=1
AND ADR_EMAIL LIKE '__%@%__.__%'
THEN ADR_EMAIL ELSE NULL END AS EMAIL
FROM EMAILTABLE

Upvotes: 1

Mohammad Anini
Mohammad Anini

Reputation: 5230

If you have all the allowed special characters, following select statement should select all columns with only English and the allowed special characters:

select column_name from table_name where column_name like '%[^a-z, .-™]%';

You can add all the allowed special characters inside the square brackets.

Upvotes: 8

shivadarshan
shivadarshan

Reputation: 946

Got a great answer for my question.

     select ID, LATITUDE, LONGITUDE, REFERENCE, LOCATION, VALIDATE,
     patindex('%[^ !-~()"]%' COLLATE Latin1_General_BIN,LOCATION) as [Position],
     substring(LOCATION,patindex('%[^ !-~()"]%' COLLATE Latin1_General_BIN,LOCATION),1) as [InvalidCharacter],
     ascii(substring(LOCATION,patindex('%[^ !-~()"]%' COLLATE Latin1_General_BIN,LOCATION),1)) as [ASCIICode]from  dbo.RADAR_SEARCH where patindex('%[^ !-~()"]%' COLLATE Latin1_General_BIN,LOCATION) >0

EDIT1: Explanation for above answer

Above query filters only non-English chars in Location column in which special characters are not included.

NOTE: Tested only in MS-SQL.

Upvotes: 4

Robert Calhoun
Robert Calhoun

Reputation: 5133

T-SQL's string-handling capability is pretty rudimentary.

If the "non-English" fields are distinguished by their use of Unicode UTF-16, you can try something like

SELECT * FROM MyTable WHERE MyField = Cast(MyField AS VARCHAR)

to pull only rows that are expressible in UTF-8.

The only way I know how to test whether a field is drawn from an arbitrary set of characters is with a user-defined function, like this:

CREATE FUNCTION IsAllowed (@input VARCHAR(MAX)) RETURNS BIT
-- Returns 1 if string is allowed, 0 otherwise.
-- Usages: SELECT dbo.IsAllowed('Hello'); -- returns 1
--         SELECT dbo.IsAllowed('Hello, world!'); -- returns 0
-- Note CHARINDEX is not case sensitive so @allowables doesn't need both.
--      VARCHAR(MAX) is different under SQL Server 2005 than 2008+
---     and use of defined VARCHAR size might be necessary.
AS
BEGIN
  DECLARE @allowables char(26) = 'abcdefghijklmnopqrstuvwxyz';
  DECLARE @allowed int = 0; 
  DECLARE @index int = 1;
  WHILE @index <= LEN(@input)
    BEGIN
    IF CHARINDEX(SUBSTRING(@input,@index,1),@allowables)=0
      BEGIN
      SET @allowed = 0;
      BREAK;
      END
    ELSE
      BEGIN
      SET @allowed = 1;
      SET @index = @index+1;
      END
    END
  RETURN @allowed
END

User-defined functions can be applied to columns in SELECT, like this:

SELECT * FROM MyTable WHERE dbo.IsAllowed(MyField) = 1

Note the schema name (dbo in this case) is not optional with user-defined functions.

If a T-SQL user-defined function is inadequate, you can also use a CLR Function. Then you could apply a regexp or whatever to a column. Because they break portability and pose a security risk, many sysadmins don't allow CLR functions. (This includes Microsoft's SQL Azure product.)

Upvotes: 15

Related Questions