Cumulative number of unique values in a column up to current row

Question

I have a data frame, donorInfo, with donor information:

id        giftdate     giftamt
002       2001-01-05     25.00
033       2001-05-08     50.00
054       2001-09-22    125.00
125       2001-11-05     40.00
042       2001-12-04     75.00
...           ...         ...

I'd like to create a column that shows the cumulative number of unique donor id's up to that date. I think it's something like:

donorInfo$numUnique <- apply/lapply (donorInfo, 1, FUN=nrow(unique(donorInfo$id)))

unfortunately this isn't working and I'm wondering how to remedy things. Thanks for any suggestions.

Josh O&#39;Brien · Accepted Answer

You can do this with duplicated() and cumsum() (taking advantage of the fact that Boolean-valued logical vectors can be coerced to numeric vectors):

# Example data.frame with some duplicated ids
df <- read.table(text="
id   giftdate giftamt
 2 2001-01-05      25
33 2001-05-08      50
 2 2001-09-22     125
33 2001-11-05      40
42 2001-12-04      75", header=T)

cumsum(!duplicated(df$id))
# [1] 1 2 2 2 3

Cumulative number of unique values in a column up to current row

Answers (2)

Related Questions