Jamesmartinn
Jamesmartinn

Reputation:

SAS and Date operations

I've tried googling and I haven't turned up any luck to my current problem. Perhaps someone can help?

I have a dataset with the following variables:

ID, AccidentDate

It's in long format, and each participant can have more than 1 accident, with participants having not necessarily an equal number of accidents. Here is a sample:

Code:

ID  AccidentDate
1   1JAN2001
2   4MAY2001
2   16MAY2001
3   15JUN2002
3   19JUN2002
3   05DEC2002
4   04JAN2003

What I need to do is count the number of days between each individuals First and Last recorded accident date. I've been playing around with first.byvariable and last.byvariable commands, but I'm just not making any progress. Any tips? or Any links to a source?

Thank you,

Also. I posted this originally over at Talkstats.com (cross-posting etiquette)

Upvotes: 0

Views: 1976

Answers (3)

Nelson Wu
Nelson Wu

Reputation: 98

Assuming the data looks like:

ID  AccidentDate 
1   1JAN2001 
2   4MAY2001 
2   16MAY2001 
3   15JUN2002 
3   19JUN2002 
3   05DEC2002 
4   04JAN2003

You have the right idea. Retain the first accident date in order to have access to both the first and last dates. Then calculate the difference.

proc sort data=accidents;
  by id accidentdate
run;

data accidents; 
  set accidents;
  by id;
  retain first_accidentdate;
  if first.id then first_accidentdate = accidentdate;
  if last.id then do;
    daysbetween = date - first_accidentdate
    output;
  end;
run;

Upvotes: 0

o.h
o.h

Reputation: 1262

By long format I think you mean it is a "stacked" dataset with each person having multiple observations (instead of one row per person with multiple columns). In your situation, it is probably the correct way to have the data stored.

To do it with data steps, I think you are on the right track with first. and last.

I would do it like this:

proc sort data=accidents;
  by id date;
run;

data accidents; set accidents;
  by id accident; *this is important-it makes first. and last. available for use;
  retain first last;
  if first.date then first=date;
  if last.date then last=date;
run;

Now you have a dataset with ID, Date, Date of First Accident, Date of Last Accident

You could calculate the time between with

data accidents; set accidents;
  timebetween = last-first;
run;

You can't do this directly in the same data step since the "last" variable won't be accurate until it has parsed the last line and as such the data will be wrong for anything but the last accident observation.

Upvotes: 0

Feng Mai
Feng Mai

Reputation: 3109

Not sure what you mean by in long format long format should be like this

  id     accident  date
  1       1       1JAN2001 
  1       2       1JAN2002 
  2       1       1JAN2001 
  2       2       1JAN2003 

Then you can try proc sql like this

Proc Sql;
select id, max(date)-min(date) from table;
group by id;
run;

Upvotes: 4

Related Questions