Reputation: 791
All,I am trying to create a T sql scalar function that removes any extended ascii characters in a passed in string value.I am doing this using the PatIndex() function as below.One thing I noticed is that the T sql Char() and ascii() functions return a different character/decimal value when invoked with the correct inputs from whats mentioned in the asciitable.com .Any idea why this is diffrent? The collation used on the database(Sql server 2012) is SQL_Latin1_General_CP1_CI_AS
I use this url for copying and pasting extended ascii characters http://www.rapidtables.com/code/text/ascii-table.htm
SELECT
ASCII('Ñ') AS tsql_decimal_value,
165 AS value_from_ascii_table
SELECT
CHAR(165) AS tsql_value,
'Ñ' AS value_from_asciitable
My Tsql Function
CREATE FUNCTION dbo.udf_Remove_NON_ASCII_characters (@value AS nvarchar(2000))
RETURNS varchar(2000)
AS
BEGIN
DECLARE @incorrectcharLoc smallint --Position of bad character
DECLARE @pattern varchar(140) --Bad characters to look for
SELECT
@pattern = '%[' + CHAR(128) + .... + CHAR(255) + ']%'
SELECT
@incorrectcharLoc = PATINDEX(@pattern, @value)
WHILE @incorrectcharLoc > 0
BEGIN
SELECT
@value = STUFF(@value, @IncorrectcharLoc, 1, '')
SELECT
@IncorrectcharLoc = PATINDEX(@Pattern, @value)
END
RETURN @value
END
Please see attched screenshot for sql server result sets
Any help would be really appreciated
Upvotes: 0
Views: 869
Reputation: 1298
Having reached my own computer, I first checked the documentation on CHAR() and ASCII().
CHAR() returns an INT ASCII character code
ASCII() returns the ASCII code value of the leftmost character of a character expression
With these presuppositions and for kicks using the letter ñ
, I ran the following statement:
SELECT UPPER('ñ') AS Character -- returns Ñ
, ASCII( UPPER('ñ') ) ASCII_Value -- returns 209
, CHAR( ASCII( UPPER('ñ') ) ) -- returns Ñ
, CHAR(165) AS CHAR_VALUE2 -- returns ¥
Notice that the ASCII code for the value Ñ is 209, and not 165. Also note that there are disparate unofficial versions that list the ASCII codes, such as ASCII-code.com, which does not list Ñ in the first 200 INT values. The trouble is unofficial sources.
Therefore, we can conclude:
One solution could be to build out your own mapping of the ASCII tables and reference that. Whatever you choose, SQL Server uses the implementation of the ISO ASCII standard, so you might as well make SQL the SOT for these issues.
Upvotes: 1