Reputation: 1599
Firstly I am not a SAS programmer, so forgive me if this question is too easy or is difficult to follow!
I have an application which creates UTF-8 encoded XML files (and map files) that are to be read into SAS (9.3). These files can contain characters such the following (note the less than or equals):
<DocumentElement>
<DATA>
<TEXT>≤ 50 %</TEXT>
</DATA>
</DocumentElement>
We have an external third party attempting to read these files, but I understand that SAS's default encoding is Wlatin1.
I have tried giving them a number of options based on the SAS docs as to what options to specify when reading these files, but I can't seem to get the correct combination of encoding options. Basically I want to import the XML, with a given MAP, into a dataset in SAS preserving the UTF-8 character encoding.
Assuming we are using libname xml, the docs suggest the following to read the xml:
filename NHL 'C:\My Documents\XML\NHL.xml';
filename MAP 'C:\My Documents\XML\NHL.map';
libname NHL xml xmlmap=MAP;
proc print data=NHL.TEAMS;
run;
Which statements do I have to apply encoding options to, (I have tried the libname statement with XMLENCODING, INENCODING and OUTENCODING
Upvotes: 1
Views: 2921
Reputation: 10411
Whichever encoding is used during your sas session, you can use filename
's encoding=
option, which will inform sas about the encoding used by that external file. It will not impact the encoding used to write the data in a sas table, but will make sure the input files are read correctly.
filename NHL 'C:\My Documents\XML\NHL.xml' encoding="utf-8";
filename MAP 'C:\My Documents\XML\NHL.map' encoding="utf-8";
Note however that SAS expects utf-8 BOM characters to be present.
Upvotes: 1
Reputation: 1599
Ok, think I figured this out.
It turns out SAS has a session encoding, which it will try to transcode the data to if the input files do not match. Running SAS with a session encoding of UTF-8 avoids all of these issues, and you can then specify the ENCODING= option if required for any files (which I don't have to, as they are already utf-8).
SAS have a paper about this here.
Upvotes: 1