Reputation: 835
I am working on converting a SAS code to R but I am having trouble replicationg the IF First. & Last. command in R. The SAS command is -
Data A;
Set B;
BY CompID, Id, Date;
IF First.Date;
run;
My understanding is that only the earliest date for a CompID, ID and Date combination is chosen and output into data A. Am I right?
I am aware of the duplicated command in R but if I use the following code -
A <- B[!duplicated(B$Date),]
I get lesser observations than my SAS output. Am I missing on something here?
Thanks in advance.
Upvotes: 0
Views: 1473
Reputation: 263301
The construction in R could be (since there is also a duplicated.data.frame
function):
A <- B[!duplicated(B[ c('CompID', 'Id', 'Date') ] ) ,]
To duplicate a .Last operation, look at the help page for duplicate and I think you will find some sort of fromLast
parameter, but I always need to check its spelling.
The construction: "I get lesser observations than ..." sounds wrong to me, but I have not traveled in all the English speaking countries. At least in the US, I think "fewer" or " a lower count" would read a bit easier.
Upvotes: 2
Reputation:
First of all the statement BY CompID, Id, Date;
should not have any commas in it.
Secondly, A <- B[!duplicated(B$Date),]
is not the equivalent of the SAS code you posted.
The correct equivalent would be:
Data A;
Set B;
BY Date;
IF First.Date;
run;
My understanding is that only the earliest date for a CompID, ID and Date combination is chosen and output into data A. Am I right?
Your understanding is correct.
Upvotes: 1