vmaha13
vmaha13

Reputation: 21

Reading excel file in sas with date columns

I am facing an issue, while importing an excel file into sas environment. So basically in the Excel file there are few columns named as

Geography   
AR_NO   
31-Jan-18   
28-Feb-18   
31-Mar-18   
30-Apr-18   
31-May-18   
30-Jun-18

After using below the code - >

%macro FX_Lkup(sheet);
FILENAME FXFILE "/idn/home/Module2/excel.xlsx";

PROC IMPORT DATAFILE=FXFILE
       DBMS=XLSX
       OUT=&sheet. 
       REPLACE
       ;
       SHEET="&sheet."; 
       RUN;

%mend FX_Lkup;
%FX_Lkup(LENDING_TEMPLATE);
%FX_Lkup(2018FXRates);

SAS data print the columns as 
    Geography   
    AR_NO   
    43131
    43159
    43190
    43220
and so on.

Does any have solution on that? Any lead would be really appreciated : )

Thanks !

Upvotes: 0

Views: 4166

Answers (3)

Richard
Richard

Reputation: 27526

IMPORT is using Excel date valued column headers as Excel epoch date numbers.

Use Proc DATASETS to change the column label, and possibly rename the columns to something generic such as DATE1-DATE6. Or, continue on, and further unpivot the data into a categorical form with columns GEO, AR_NO, DATE, VALUE

You might be asking yourself "Where do those numbers, such as 43131, come from?", or "What is an Excel epoch date number?"

They are unformatted Excel date values. The human readable date represented by a number is determined by a systems epoch, or the date represented by the number 0.

Different systems of time keeping have different epochs (starting points) and time units. Some examples:

  • 21DEC1899 Excel datetime number 0, 1 = 1 day
  • 01JAN1960 SAS date number 0, 1 = 1 day
  • 01JAN1960 SAS datetime number 0, 1 = 1 second
  • 01JAN1970 Unix OS datetime number 0, 1 = 1 second

To convert an Excel date number to a SAS date number you need to subtract 21916, which is the number of days from 31DEC1899 to 01JAN1960

This understanding of date epochs will be used when setting the label of a SAS column and renaming the column.

For others fiddling with code, the following will create an Excel worksheet having Date valued column headers. I speculate such a situation can otherwise arise when importing a worksheet containing an Excel pivot table.

First create some sample SAS data

data demo_tall;
  do Geography = 'Mountains', 'Plains';
  do AR_NO = 1 to 3;
    _n_ = 0;
    length label $200;
    do label = '31-Jan-18', '28-Feb-18', '31-Mar-18',
               '30-Apr-18', '31-May-18', '30-Jun-18'
    ;
        _n_ + 1;
        name = cats('Date',_n_);
        value + 1;
        output;
    end;
  end;
  end;
run;

proc transpose data=demo_tall out=demo_wide(drop=_name_);
  by Geography AR_NO;
  var value;
  id name;
  idlabel label;
run;

Sample SAS data set (pivoted with transpose) enter image description here

Then create Excel sheet with Excel date valued and formatted column headers

ods noresults;
ods excel file='%TEMP%\across.xlsx' options(sheet_name='Sample');

data _null_;
  declare odsout xl();

  if 0 then set demo_wide;
  length _name_ $32;

  xl.table_start();
    * header;
    xl.row_start();
      do _n_ = 1 to 100; * 100 column guard;
        call vnext(_name_);
        if _name_ = '_name_' then leave;

        _label_ = vlabelx(_name_);
        _date_ = input(_label_, ?? date9.);

        * make some header cells an Excel date formatted value;

        if missing(_date_) then
          xl.format_cell(data:_label_);
        else
          xl.format_cell(
            data:_date_,
            style_attr:"width=9em tagattr='type:DateTime format:dd-mmm-yy'"
          );
      end;
    xl.row_end();

    * data rows;
    do _n_ = 1 by 1 while (not lastrow);
      set demo_wide end=lastrow;

      xl.row_start();
        call missing(_name_);
        do _index_ = 1 to 100; * 100 column guard;
          call vnext(_name_);
          if _name_ = '_name_' then leave;

          xl.format_cell(data:vvaluex(_name_));
        end;
      xl.row_end();
    end;

  xl.table_end();

  stop;
run;

ods excel close;
ods results;

Excel file created
enter image description here

IMPORT Excel worksheet

Log will show the 'funkiness' of date valued column headers

options msglevel=I;

proc import datafile='%temp%\across.xlsx' dbms=xlsx replace out=want;
  sheet = "Sample";
run;

proc contents noprint data=want out=want_meta(keep=name label varnum);
run;
----- LOG -----
1380  proc import datafile='%temp%\across.xlsx' dbms=xlsx replace out=want;
1381    sheet = "Sample";
1382  run;

NOTE:    Variable Name Change.  43131 -> _43131
NOTE:    Variable Name Change.  43159 -> _43159
NOTE:    Variable Name Change.  43190 -> _43190
NOTE:    Variable Name Change.  43220 -> _43220
NOTE:    Variable Name Change.  43251 -> _43251
NOTE:    Variable Name Change.  43281 -> _43281
NOTE: VARCHAR data type is not supported by the V9 engine. Variable Geography has been converted
      to CHAR data type.
NOTE: The import data set has 6 observations and 8 variables.
NOTE: WORK.WANT data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

Modify the header (metadata) of the imported data set

Date valued column names will be renamed DATE1-DATE6 and the label will be changed to be the corresponding date in SAS format DATE11. (dd-mon-yyyy)


%let renames=;
%let labels=;

data _null_;
  length newname $32;
  length renames labels $32767;
  retain renames labels;

  set meta end=lastvar;

  date = coalesce(input(label, ?? 5.),1e9) + '31dec1899'd;


  if '01jan1980'd < date < today() then do;
    index + 1;

    newname = cats('DATE',index);
    label = quote(trim(put(date,date11.)));

    labels  = catx(' ', labels,  catx('=',name,label));
    renames = catx(' ', renames, catx('=',name,newname));
  end;

  if lastvar;

  if not missing(labels)  then call symput('labels',  trim('LABEL '  || labels));
  if not missing(renames) then call symput('renames', trim('RENAME ' || renames));
run;

proc datasets nolist lib=work;
  modify want;
  &labels;
  &renames;
run;
quit;

%symdel labels renames;

%let syslast = want;

The result, when printed.

enter image description here enter image description here

Optional

Unpivot to a categorical form (tall layout)

proc transpose data=want out=stage1(rename=(col1=value _label_=date_string));
  by geography ar_no;
  var date:;
  label _name_ = ' ';
  label date_string = ' ';
run;

data want_tall;
  set stage1;
  date = input (date_string, date11.);
  format date date11.;
  keep geography ar_no _name_ date value;
run;

Upvotes: 0

Tom
Tom

Reputation: 51621

If you have mixed character and numeric values in the same column then SAS will be forced to create the variable as character. When it does that it stores the number that Excel uses for the date as a string of digits. To convert it to a date in SAS first convert the string of digits to a number and then adjust the number to account for the difference in how SAS and Excel count days.

data want ;
  set LENDING_TEMPLATE ;
  date = input(geography,??32.) + '30DEC1899'd;
  format date date9.;
run;

Dates as headers

If your excel file is using dates as column headers then SAS will also convert them to digit strings since variable names are always characters strings, not numbers. One quick way to fix it is to use PROC TRANSPOSE. This will be easy when each row is uniquely identified by the other variables and when all of the "date" variables are numeric.

proc transpose data=LENDING_TEMPLATE out=tall ;
  by geography ar_no ;
run;

data tall ;
  set tall ;
  date = input(_name_ , 32.) + '30DEC1899'd ;
  format date date9. ;
  drop _name_;
run;

You could stop here as now you have a useful dataset where the date values are in a variable instead of hiding in the metadata (variable name).

To get back to your original wide layout just add another PROC TRANSPOSE and tell it to use DATE as the ID variable.

 proc transpose data=tall out=wide ;
    by geography ar_no ;
    id date;
    var col1;
 run;

Upvotes: 0

Niqua
Niqua

Reputation: 644

It is correctly imported, SAS uses numbers to store dates. in order to have a date in your final table, you need to declare format = AFRDFDE7. for instance

Upvotes: 1

Related Questions