Pmmoks
Pmmoks

Reputation: 55

Preserving newlines (line feeds) in SAS when importing from excel

I am importing a dataset from excel using the built in import data wizard. However, when viewing the data in SAS, cells with newlines have all line feeds (alt+Enter) replaced with a period (.)

For example, in excel:
"Example text
with new line"

will be read in by SAS as:
"Example text.with new line"

Usually line feeds or carriage returns are replaced by spaces, where the hex code (if you format the text as hex) is 0A. When I convert the text in excel to hex in excel using a formula, the new line feeds also show up as 0A.

However, the hex code for the period in my text (what used to be a line return in excel) is 2E, rather than the expected 0A. This prevents me from differentiating them from normal full stops, which means there's no possible workaround. Has anyone else come across this issue? Is there an option to change/set the default line feed replacement character in SAS?

My import code (variables replaced with 'text' for simplicity) for reference:

data work.table;
    length
        text $ 50;
    label
        text = "Text"
    format
        text $CHAR50;
    informat
        text $CHAR50;
    infile 'path/to/file'
        lrecl=1000
        encoding='LATIN9'
        termstr=CRLF   
        dlm='7F'x
        missover
        dsd;
    input
        text $CHAR50;
run;

Upvotes: 2

Views: 1531

Answers (2)

Joe
Joe

Reputation: 63434

I would not recommend using the Import Wizard; there are far better tools nowadays. EG's import wizard is unique in SAS tools in how it works, and really was meant only to supply a way for data analysts who were not programmers to quickly bring in data; it's not robust enough for production work.

In this case, what's happening is that SAS's method for reading the data in is very rudimentary. What it does is convert it to a delimited file, and it doesn't handle LF characters very cleanly there. Instead of keeping them, which would be possible but is riskier (remember, this has to work for any incoming file), what it does is convert those to periods.

You'll see that in the notes in the program it generates:

Some characters embedded within the spreadsheet data were translated to alternative characters so as to avoid transmission errors.

It's referring to the LF character in that case.

The only way to get around this that I'm aware of is to either:

  • Convert the file to CSV from Excel yourself, and then read it in
  • Use ACCESS to PC FILES (via PROC IMPORT, or the checkbox in the import wizard)

Either of those will allow you to read in your line feed characters.

Upvotes: 0

Richard
Richard

Reputation: 27508

SAS Viewer will not render so called non-printables (characters <= '1F'x) and does not display carriage return characters as a line break.

Example:

Excel cell with two line breaks in the data value

enter image description here

Imported with

proc import datafile='sample.xlsx' out=work.have;
run;

and viewed in standard SAS data set viewer (viewtable) appear to have lost the new lines.

enter image description here

Rest assured they are still there.

proc print data=have;
  var text / style = [fontsize=14pt];
  format text $hex48.;
run;

enter image description here

Upvotes: 0

Related Questions