Rosie Thomas
Rosie Thomas

Reputation: 63

SQL column of type varchar creates � symbols when read into SAS

I am reading data into SAS from a SQL table using a basic libname and data step. However, certain 'special characters' - in this case a dash - are read as a black diamond with a question mark:

I am aware that this is caused by an encoding issue - the SQL column has a varchar datatype, and SAS cannot read this properly (details of why this happens would be appreciated). A solution I am aware of is changing the column to type nvarchar; however, I do not own the database so cannot change this.

I have tried various options relating to encoding, inencoding and outencoding (in the libname and data step) but cannot get the right combination, if there is one.

My current workaround is to create a view which uses CAST to convert the data type, and reading the view into SAS. However, I am convinced there must be a coding solution - does anyone know?

Upvotes: 0

Views: 1113

Answers (1)

Stu Sztukowski
Stu Sztukowski

Reputation: 12909

In ASCII, "U+FFFD � REPLACEMENT CHARACTER" is used to replace an unknown, unrecognized or unrepresentable character. If this is the only character causing you issues, you can simply convert it into a dash.

As an example, let's replace � values with a dash:

data have;
    length character $20.;
    infile datalines dlm=',';
    input character$;
    datalines;
Sugar�free
Camera�ready
Custom�built
;
run;

data want;
    set have;

    character = tranwrd(character, '�', '-');
run;

If that doesn't work, here is an alternative option.

Step 1: Find a single example of the character and get its ASCII hex code

data hex_code;
    set have(obs=1);
    ascii_hex = put(substr(character, 6, 1), $hex.);
run;

In this case, the hex code is 1A. We can use this as a hex literal to replace the offending character.

Step 2: Use tranwrd with the hex literal that you found

SAS will automatically understand '1A'x as a hex value, and will search for it in the string. If it is found, tranwrd will replace all instances with a dash.

data want;
    set have;

    character = tranwrd(character, '1A'x, '-');
run;

Upvotes: 1

Related Questions