Reputation: 4564
Questions: How do I define the variable type of variables being imported from a .xlsx file when using PROC IMPORT?
My work
I am using SAS v9.4. So far as I'm aware, it is vanilla SAS. I do not have SAS/ACCESS etc.
My data looks like this:
ID1 ID2 MONTH YEAR QTR VAR1 VAR2
ABC_1234 1 1 2010 1 869 3988
ABC_1235 12 2 2010 1 639 3144
ABC_1236 13 3 2010 2 698 3714
ABC_1237 45 4 2010 2 630 3213
The procedure I am running is:
proc import out=rawdata
datafile = "c:\rawdata.xlsx"
dbms = xlsx replace;
format ID1 $9. ;
format ID2 $3. ;
format MONTH best2. ;
format YEAR best4. ;
format QTR best1. ;
format VAR1 best3. ;
format VAR2 best4. ;
run;
When I run this step, I get the following log output:
ERROR: You are trying to use the character format $ with the numeric variable ID2 in data set WORK.RAWDATA.
What this seems to tell me is that SAS automatically assigns the variable type. I want to be able to control it manually. I cannot find documentation which explains how to do this. INFORMAT, LENGTH, and INPUT statements do not seem to work for PROC IMPORT.
I am using PROC IMPORT because it has yielded the greatest success with .xlsx files overall. Two possible solutions I can think of are 1) convert .xlsx to .csv and use INFILE in a DATA step and 2) bring the data in as numeric and convert it to character in a later step. I dislike the first solution because it requires me to manually manipulate the data, a potential source of error (such as leading zeros being removed). I dislike the second because it may unintentionally introduce errors (again, such as with leading zeros) and introduces extraneous work.
Upvotes: 4
Views: 21739
Reputation: 4564
I solved this by not using PROC IMPORT
. It isn't a solution for everyone, but it worked great for my purposes (i.e. not "big data"). If you're reading from an Excel spreadsheet, it should work for you.
ImportDataFile
is a macro1 which automates a data step import. A data step import requires a LENGTH
statement to define variable names and types, an INPUT
statement to read raw data from an external file, and an INFILE
statement to specify which file.
data &dataset.;
&infileStatement.;
length &lengthStatement. ;
input (_all_) (:) ;
run;
The macro is composed of three primary steps:
Notice how each of those corresponds to the three lines in the data step. Everything in the macro is in support of that data step.
In my experience, it was best to import the data as fixed width characters and then convert to whatever type was needed in a separate step. Yeah, it's redundant, but I never ran into issues with memory or space. The benefits far outweighed any hypothetical concerns. It made the data flow identical for each analysis which aided verification and saved time overall by avoiding the need to correct SAS's guess at the type (and inevitable silent truncation).
Because SAS is a horribly verbose language, this answer runs up against the StackOverflow answer character limit. A fully documented copy is here: https://pastebin.com/raw/RsXz3juJ Put the code in a file named something like ImportDataFile.sas
and make sure it runs (probably using %include
) before the macro is called. The call form is:
%ImportDataFile(
dirData=
, fileName=
, dataset=
, delimiter=
, overOption=
, headerRow=
, sheet=
, range=
, prefix=
, case=
, defLength=
);
where
Output(s) : SAS dataset, macro variable &listHeader
Inputs : dirData= Directory containing data file.
fileName= Filename including file extension. Must be
.csv, .txt, .tsv, .xls, or .xlsx.
dataset= Name of dataset output to WORK library.
delimiter= (optional) Delimiting string given in
quotes. Default for CSV is a comma, for
TXT/TSV a tab. This parameter may not be
set for Excel files. Doing so generates a
warning.
overOption= (optional) INFILE option. Default is
MISSOVER. Other choices are FLOWOVER,
STOPOVER, TRUNCOVER, or SCANOVER.
headerRow= (optional) Row corresponding to header in
an Excel file. Accepts R#C#:R#C#, but
should be given as R#. Default is R1.
sheet= Name of worksheet. Required for XLS or XLSX.
range= Range of spreadsheet to be imported.
Required for XLS and XLSX. Use form
R#C#:R#C#. See example below.
prefix= (optional) String to append to beginning of
each variable name. Default is no prefix.
case= (optional) Toggle mix case variable naming.
Must be lower/upper/mixed. Default is
lower.
defLength= (optional) Character field length. Default
value is 100.
For example, the following creates a dataset named xl_import
of character type with width 100 from a my_xl_file.xlsx
located in C:\Path\To\File
. The columns are prefixed with the string "raw_". The overOption
corresponds to those defined in the INFILE
statement.
%ImportDataFile(
dirData= C:\Path\To\File
, fileName= my_xl_file.xlsx
, dataset= xl_import
, prefix= raw_
, sheet= Sheet1
, range= R2C1:R13C18
, defLength= 100
, overOption= MISSOVER
);
Here is the code for the macro. Enjoy.
********************************************************************
** Utilities / Sub Macros
********************************************************************;
%macro ClearFileRef(fileRef);
filename &fileRef. clear;
%mend;
%macro CompareVariablesToDDERange();
%local columnIndex numberOfDDEColumns;
%let columnIndex = %eval(%sysfunc(findc(&range., 'C', ib)) + 1);
%let numberOfDDEColumns = %sysfunc(substr(&range., &columnIndex));
%if %ListLength(&listHeader) ^= &numberOfDDEColumns %then
%put WARNING: [MACRO] Data file contains %ListLength(&listHeader) variables. RANGE argument has &numberOfDDEColumns columns.;
%mend;
%macro EstablishSystemLink(fileRef);
filename &fileRef. dde 'excel|system';
%mend;
%macro EstablishWorkbookLink(fileRef, dirData, fileName, sheetName, range);
filename &fileRef. dde "excel|&dirData.\[&fileName.]&sheetName.!&range.";
%mend;
%macro IsEmpty(macroVariable);
%sysevalf(%superq(¯oVariable)=, boolean)
%mend;
%macro IsFileRef(reference);
%local fileRefExists externalFileExists returnValue;
%let fileRefExists = %sysfunc(fexist(&reference.));
%let externalFileExists = %sysfunc(fileexist(&reference.));
%if &fileRefExists. = 1 and &externalFileExists. = 0 %then %let returnValue = 1;
%else %let returnValue = 0;
&returnValue
%mend;
%macro IsFilePath(reference);
%local fileRefExists externalFileExists returnValue;
%let fileRefExists = %sysfunc(fexist(&reference.));
%let externalFileExists = %sysfunc(fileexist(&reference.));
%if &fileRefExists. = 0 and &externalFileExists. = 1 %then %let returnValue = 1;
%else %let returnValue = 0;
&returnValue
%mend;
%macro GetObsCount(dataset);
%local exists returnValue closed;
%let exists = %sysfunc(open(&dataset));
%if &exists. %then %do;
%let returnValue = %sysfunc(attrn(&exists, nobs));
%let closed = %sysfunc(close(&exists));
%end;
%else %do;
%put ERROR: [&SYSMACRONAME.] Dataset %upcase(&dataset) does not exist.;
%abort cancel;
%end;
&returnValue
%mend;
%macro GetVarCount(dataset);
%local exists varCount closed;
%let exists = %sysfunc(open(&dataset));
%if &exists. %then %do;
%let varCount = %sysfunc(attrn(&exists, nvars));
%let closed = %sysfunc(close(&exists));
%end;
%else %do;
%put ERROR: [&SYSMACRONAME.] Dataset %upcase(&dataset) does not exist.;
%abort cancel;
%end;
&varCount
%mend;
%macro ListLength(list);
%local count;
%if %sysevalf(%superq(list)=, boolean) %then %let count = 0;
%else %let count = %eval(%sysfunc(countc(&list., |)) + 1);
&count
%mend;
%macro ListElement(list, n);
%local nthElement;
%let nthElement = %sysfunc(scan(%superq(&list.), &n., |, m));
&nthElement
%mend;
%macro RemoveAllFormattingFromSheet(fileRef, sheet);
data _null_;
file &fileRef.;
/* Select sheet of interest */
put "[WORKBOOK.ACTIVATE(""&sheet."")]";
/* Select first cell */
put '[FORMULA.GOTO("R1C1")]';
/* Apply dummy filter of ">2" to first column */
put '[FILTER(1, ">2")]';
/* Disable filters */
put '[FILTER()]';
/* Select all */
put '[SELECT("R[0]C[0]:R[1048575]C[16383]", "R[0]C[0]")]';
/* Unhide rows */
put '[ROW.HEIGHT(,,TRUE, 2)]';
/* Unhide columns */
put '[COLUMN.WIDTH(,,TRUE, 2)]';
/* Remove all formatting */
put '[CLEAR(2)]';
/* Autofit column width */
put '[COLUMN.WIDTH(,,TRUE, 3)]';
run;
%mend;
%macro SetSystemOptions(opt1, opt2, opt3);
options &opt1. &opt2. &opt3.;
%mend;
%macro ImportDataFile(dirData=, fileName=, dataset=, delimiter=, overOption=MISSOVER, headerRow=R1, sheet=, range=, prefix=, case=lower, defLength=100) / minoperator mindelimiter=',';
%put NOTE: [MACRO] Executing: ImportDataFile(dirData=&dirData, fileName=&fileName, dataset=&dataset, delimiter=&delimiter, overOption=&overOption, headerRow=&headerRow, sheet=&sheet, range=&range, prefix=&prefix, case=&case, defLength=&defLength);
%local
macroStart
case
extension
HeaderRef
lengthStatement
delimiter
InfileRef
infileStatement
numberOfRecords
numberOfVars
duration
;
%global
listHeader
originalNOTES
originalQUOTELENMAX
;
%let macroStart = %sysfunc(datetime());
%let originalNOTES = %sysfunc(getoption(notes));
%let originalQUOTELENMAX = %sysfunc(getoption(noquotelenmax));
%SetSystemOptions(nonotes);
********************************************************************
** Validation
********************************************************************;
%if %IsEmpty(dirData) %then %do;
%put ERROR: [&SYSMACRONAME.] DIRDATA argument is blank.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if %IsEmpty(fileName) %then %do;
%put ERROR: [&SYSMACRONAME.] FILENAME argument is blank.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if %IsEmpty(dataset) %then %do;
%put ERROR: [&SYSMACRONAME.] DATASET argument is blank.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if not(%IsEmpty(prefix)) and not(%sysfunc(nvalid(&prefix, v7))) %then %do;
%put ERROR: [&SYSMACRONAME.] Invalid PREFIX="&prefix.";
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%let case = %upcase(&case.);
%if not(&case. in (LOWER, UPPER, MIXED)) %then %do;
%put ERROR: [&SYSMACRONAME.] Invalid case option: &case. Must be LOWER, UPPER, MIX.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%let extension = %upcase(%scan(&fileName., 1, '.', b));
%if not(&extension. in (TXT, TSV, CSV, XLS, XLSX)) %then %do;
%put ERROR: [&SYSMACRONAME.] Invalid file type: &extension. Must be TXT, TSV, CSV, XLS, XLSX.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if &extension. in (XLS, XLSX) and %IsEmpty(sheet) %then %do;
%put ERROR: [&SYSMACRONAME.] SHEET argument undefined.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if &extension. in (XLS, XLSX) and %IsEmpty(range) %then %do;
%put ERROR: [&SYSMACRONAME.] RANGE argument undefined.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if not(&extension. in (XLS, XLSX)) and not(%IsEmpty(sheet)) %then %do;
%put ERROR: [&SYSMACRONAME.] SHEET argument only valid for XLS or XLSX files.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if not(&extension. in (XLS, XLSX)) and not(%IsEmpty(range)) %then %do;
%put ERROR: [&SYSMACRONAME.] RANGE argument only valid for XLS or XLSX files.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
**********************************
*** Define delimiter
**********************************;
%if %IsEmpty(delimiter) %then %do;
%if &extension. in (XLS, XLSX) %then %let delimiter = '09'x;
%else %if &extension. = CSV %then %let delimiter = ',';
%else %if &extension. in (TXT, TSV) %then %let delimiter = '09'x;
%else %do;
%put ERROR: [&SYSMACRONAME.] Delimiter error.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%end;
%if &extension. in (XLS, XLSX) and &delimiter ^= '09'x %then %do;
%put WARNING: [&SYSMACRONAME.] Delimiter for Excel file must be '09'x.;
%put WARNING: [&SYSMACRONAME.] Delimiter set to '09'x.;
%let delimiter = '09'x;
%end;
********************************************************************
** Prep Excel Worksheet
********************************************************************;
%if &extension. in (XLS, XLSX) %then %do;
%let DDECommandRef = DDEcmd;
%EstablishDDELink(fileRef=&DDECommandRef.);
%RemoveAllFormattingFromSheet(fileRef=&DDECommandRef., sheet=&sheet.);
%end;
********************************************************************
** Get header
********************************************************************;
**********************************
*** Define file reference
**********************************;
%if &extension. in (XLS, XLSX) %then %do;
%let HeaderRef = DDEHead;
%EstablishDDELink(
fileRef= &HeaderRef.
, dirData= &dirData.
, fileName= &fileName.
, sheetName= &sheet.
, range= &headerRow.
);
%end;
%else %if &extension. in (CSV, TXT, TSV) %then
%let HeaderRef = %sysfunc(dequote(&dirData.))\&fileName.;
%ReadHeaderIntoList(reference=&HeaderRef., delimiter=&delimiter., prefix=&prefix., case=&case.);
********************************************************************
** Create length statement
********************************************************************;
%let lengthStatement = %CreateLengthStatement(&listHeader., &defLength.);
********************************************************************
** Import data
********************************************************************;
**********************************
*** Define infile statement
**********************************;
%if &extension. in (XLS, XLSX) %then %do;
%let InfileRef = DDESheet;
%EstablishDDELink(
fileRef= &InfileRef.
, dirData= &dirData.
, fileName= &fileName.
, sheetName= &sheet.
, range= &range.
);
%let infileStatement = infile &InfileRef. dlmstr=&delimiter. dsd notab &overOption.;
%CompareVariablesToDDERange();
%end;
%else %if &extension. in (CSV, TXT, TSV) %then %do;
%let InfileRef = %sysfunc(dequote(&dirData.))\&fileName.;
%let infileStatement = infile "&InfileRef." dlmstr=&delimiter. dsd &overOption. firstobs = 2 end=last_record;
%end;
**********************************
*** Perform import
**********************************;
data &dataset.;
&infileStatement.;
length &lengthStatement. ;
input (_all_) (:) ;
run;
********************************************************************
** Housekeeping
********************************************************************;
%let numberOfRecords = %GetObsCount(&dataset.);
%let numberOfVars = %GetVarCount(&dataset.);
%SetSystemOptions(notes);
%put;
%put NOTE: [MACRO] The dataset WORK.%upcase(&dataset.) has &numberOfRecords. observations and &numberOfVars. variables.;
%put NOTE: [MACRO] IMPORTDATAFILE macro used (Total process time):;
%let duration = %sysfunc(putn(%sysevalf(%sysfunc(datetime()) - ¯oStart.), time12.3));
%if %sysfunc(minute("&duration."t)) > 0 %then %do;
%put NO%str(TE-) real time %substr(&duration., 3, 8);
%end;
%else %do;
%put NO%str(TE-) real time %substr(&duration., 6, 5) seconds;
%end;
%put;
%SetSystemOptions(&originalNotes., &originalQUOTELENMAX.);
%mend;
%macro EstablishDDELink(fileRef, dirData, fileName, sheetName, range);
%put NOTE: [&SYSMACRONAME] Executing: EstablishDDELink(fileRef=&fileRef, dirData=&dirData, fileName=&fileName, sheetName=&sheetName, range=&range);
%local dirData linkConnection stopTime closeReturnCode;
********************************************************************
** Validate arguments
********************************************************************;
%if %IsEmpty(fileRef) %then %do;
%put ERROR: [&SYSMACRONAME] fileRef is blank.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if %length(&fileRef.) > 8 %then %do;
%put ERROR: [&SYSMACRONAME] Fileref &fileRef exceeds 8 character limit.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%if not %IsEmpty(dirData) %then %let dirData = %sysfunc(dequote(&dirData.));
********************************************************************
** Assign fileref according to link type
********************************************************************;
%if %IsEmpty(dirData)
and %IsEmpty(fileName)
and %IsEmpty(sheetName)
and %IsEmpty(range) %then %EstablishSystemLink(&fileRef.);
%else %EstablishWorkbookLink(&fileRef., &dirData., &fileName., &sheetName., &range.);
********************************************************************
** Check that link has been established
********************************************************************;
%let linkConnection = %sysfunc(fopen(&fileRef, S));
%if not (&linkConnection. > 0) %then %do;
/*Run until either Excel opens (linkConnection > 0)
or until 10 seconds have passed.*/
%let stopTime = %sysevalf(%sysfunc(datetime()) + 10);
%do %until (&linkConnection. > 0);
%if (%sysfunc(datetime()) >= &stopTime.) %then %do;
%put ERROR: [&SYSMACRONAME] DDE system link was not established. Operation timed out.;
%ClearFileRef(fileRef.);
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
%let linkConnection = %sysfunc(fopen(&fileRef, S));
%end;
%end;
********************************************************************
** Housekeeping
********************************************************************;
%let closeReturnCode = %sysfunc(fclose(&linkConnection));
%mend;
%macro ReadHeaderIntoList(reference, delimiter, prefix, case) / minoperator mindelimiter=',';
%put NOTE: [MACRO] Executing: ReadHeaderIntoList(reference=&reference, delimiter=&delimiter, prefix=&prefix, case=&case);
%local fileSpecification notab delimiter;
%global listHeader;
%SetSystemOptions(nonotes);
%if %IsEmpty(reference) %then %do;
%put ERROR: [&SYSMACRONAME.] REFERENCE argument is blank.;
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
********************************************************************
** Determine infile statement options
********************************************************************;
/*SAS filerefs exist only for Excel files*/
%if %IsFileRef(&reference.) %then %do;
%let fileSpecification = &reference.;
%let notab = notab;
%end;
/*Absolute references only for CSV,TXT,TSV files*/
%else %if %IsFilePath(&reference.) %then %do;
%let fileSpecification = "&reference.";
%let notab = ;
%let extension = %upcase(%scan(&reference., 1, '.', b));
%end;
%else %do;
%put ERROR: [&SYSMACRONAME.] Invalid input REFERENCE: [&reference.];
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
%abort cancel;
%end;
********************************************************************
** Read in header
********************************************************************;
data _null_;
infile &fileSpecification. dlmstr = '```#@' ¬ab. obs = 1 lrecl = 32767 ;
length
raw_header_line $ 32767
raw_with_pipes $ 32767
;
input raw_header_line;
raw_with_pipes = tranwrd(raw_header_line, &delimiter., '|');
call symput('rawListHeader', strip(raw_with_pipes));
run;
********************************************************************
** Transform headers into valid variable names
********************************************************************;
%SetSystemOptions(noquotelenmax);
data _null_;
length
i 8
listLength 8
header_i $ 32767
temp_i $ 32767
listValid $ 32767
;
listLength = %ListLength(%superq(rawListHeader));
do i = 1 to listLength;
header_i = scan("%superq(rawListHeader)", i, '|', 'm');
**********************************
*** Apply prefix
**********************************;
if not missing(header_i) then prefixed_i = cats("&prefix.", header_i);
else prefixed_i = header_i;
**********************************
*** Apply case
**********************************;
if "&case." = "LOWER" then cased_i = lowcase(prefixed_i);
else if "&case." = "UPPER" then cased_i = upcase(prefixed_i);
else cased_i = prefixed_i;
**********************************
*** Keep valid otherwise correct
**********************************;
if nvalid(cased_i, 'v7') then do;
if i = 1 then listValid = cased_i;
else listValid = catx('|', listValid, cased_i);
end;
else do;
**********************************
*** Fill in blank headers
**********************************;
if missing(cased_i) and "&case." = "UPPER" then temp_i = "%upcase(&prefix.)NO_HEADER";
else if missing(cased_i) then temp_i = "&prefix.no_header";
**********************************
*** Replace blanks with _ and
*** Remove invalid characters
**********************************;
else do;
replaced_space_with_underscore = tranwrd(strip(cased_i), ' ', '_');
temp_i = compress(replaced_space_with_underscore, '_', 'kin');
end;
**********************************
*** Make first char _ if digit
**********************************;
if anydigit(temp_i) = 1 then temp_i = cats('_', temp_i);
**********************************
*** Trim length to 32
**********************************;
if length(temp_i) > 32 then temp_i = substr(temp_i, 1, 32);
**********************************
*** Verify valid V7 name
**********************************;
if not nvalid(temp_i, 'v7') then do;
put 'ERROR: [&SYSMACRONAME.] Error cleaning header ' i +(-1) '. Invalid SAS name.';
call execute('
%SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
data _null_;
abort cancel nolist;
run;');
stop;
end;
if i = 1 then listValid = temp_i;
else listValid = catx('|', listValid, temp_i);
end;
output;
end;
call symput('listValid', strip(listValid));
run;
********************************************************************
** Append repeated headers with incremented value
********************************************************************;
/*Use hash table with key being each header and value
corresponding to the number of occurences. Create new
header list as follows: If first occurence of a header,
add to list. If not first occurence, ruthlessly append
occurence number (ensuring validity) and add to list.
Beware: SAS documentation for hashes contains syntax
errors.*/
data _null_;
length
element_i $ 32
item $ 32
occurrences 8
new_list $ 32767
;
declare hash h();
h.defineKey('item');
h.defineData('item', 'occurrences');
h.defineDone();
call missing(item, occurrences);
listLength = input("%ListLength(&listValid.)", 8.);
do i = 1 to listLength;
element_i = scan("&listValid.", i, '|');
if not (h.find(key: element_i) = 0) then do;
h.add(key: element_i, data: element_i, data: 1);
new_list = catx('|', new_list, element_i);
end;
else do;
occurrences + 1;
h.replace(key: element_i, data: element_i, data: occurrences);
len = length(element_i);
digits = ceil(log10(occurrences + 1));
if (len + digits) > 32 then
new_element = cats(substr(element_i, 1, len - digits), occurrences);
else new_element = cats(element_i, occurrences);
new_list = catx('|', new_list, new_element);
end;
end;
call symput('listHeader', strip(new_list));
run;
%mend;
%macro CreateLengthStatement(listHeader, defLength);
%local lengthStatement header_h;
%let lengthStatement=;
%do h = 1 %to %ListLength(&listHeader.);
%let header_h = %ListElement(listHeader, &h);
%if &h. = 1 %then %let lengthStatement = &header_h. $ &defLength. ;
%else %let lengthStatement = &lengthStatement. &header_h. $ &defLength. ;
%end;
%let lengthStatement = &lengthStatement;
&lengthStatement
%mend;
1 The solution uses macros extensively. In my experience, people advised me to avoid macros. I found it best, with all respect, to ignore that advice. SAS doesn't have functions, which makes developing abstractions hard. Macros allow you to imitate functions. A common fear of macros is debugging. Stick to the Single Responsibility Principle and you'll find they're not hard to debug at all. Document them with a %put
statement and you'll know who is being called and when. If you're not familiar with macros, they're really just text replacement. The code goes through a pre-processor and replaces macro code with text. That text, and the rest of your code, is then executed. The best resource for learning about macros is the manual.
Upvotes: 0
Reputation: 51621
Define the type in Excel.
If you want to convert it later then use a data step to convert the column.
data want ;
length id1 $9 id2 $3 ;
set rawdata(rename=(id2=numeric_id2));
id2=cats(numeric_id2);
drop numeric_id2;
run;
Upvotes: 0
Reputation: 12944
You can try to set the columns type as "Text" in Excel to see if SAS will determine it from that. Worth a shot.
If that doesn't work, unless you use PC Files Server, or have Excel of the same bitness installed on the same SAS server for direct access to the file, you will need to use a separate data step to convert the columns.
proc import
file = "c:\rawdata.xlsx"
out=_rawdata(rename=(ID2 = _ID2) )
dbms = xlsx replace;
run;
data rawdata;
format ID1 $9. ;
format ID2 $3. ;
format MONTH best2. ;
format YEAR best4. ;
format QTR best1. ;
format VAR1 best3. ;
format VAR2 best4. ;
set _rawdata;
ID2 = cats(_ID2);
drop _:;
run;
If you do have SAS/Access to Excel, you can control these variables directly with the DBDSOPTS data set option. For example:
libname myxlsx Excel 'C:\rawdata.xlsx';
data rawdata;
set myxlsx.'Sheet1$'n(DBDSOPTS="DBTYPE=(ID2='CHAR(3)')");
run;
The reason why the problem is occurring is because the xlsx
engine in proc import
is internal to SAS, and is separate from the Excel
engine. The Excel
engine uses Microsoft Jet or Ace, whereas the xlsx
engine uses a proprietary system that does not have as much control as Microsoft's. Why this is the case, I have no idea.
When proc import
is run, SAS will try to guess what format it should be (which you do have control over with xls files using the guessingrows
option). If it detects all numbers, it will assume a numeric variable. Unfortunately, without SAS/ACCESS to Excel or PC Files Server installed, you cannot control the variable type directly.
Upvotes: 5