Reputation: 28169
I have a data frame, donorInfo
, with donor information:
id giftdate giftamt
002 2001-01-05 25.00
033 2001-05-08 50.00
054 2001-09-22 125.00
125 2001-11-05 40.00
042 2001-12-04 75.00
... ... ...
I'd like to create a column that shows the cumulative number of unique donor id's up to that date. I think it's something like:
donorInfo$numUnique <- apply/lapply (donorInfo, 1, FUN=nrow(unique(donorInfo$id)))
unfortunately this isn't working and I'm wondering how to remedy things. Thanks for any suggestions.
Upvotes: 3
Views: 393
Reputation: 162401
You can do this with duplicated()
and cumsum()
(taking advantage of the fact that Boolean-valued logical vectors can be coerced to numeric vectors):
# Example data.frame with some duplicated ids
df <- read.table(text="
id giftdate giftamt
2 2001-01-05 25
33 2001-05-08 50
2 2001-09-22 125
33 2001-11-05 40
42 2001-12-04 75", header=T)
cumsum(!duplicated(df$id))
# [1] 1 2 2 2 3
Upvotes: 8
Reputation: 11946
try something like this:
donorInfo$numUnique<-sapply(seq(nrow(donorInfo)), function(rn){
length(unique(donorInfo$id[seq(rn)]))
})
Not the most efficient solution no doubt, but it should work.
Upvotes: 2