Madeleine Thornton
Madeleine Thornton

Reputation: 41

How can I derive a variable in R showing the number of observations that have the same value recorded at earlier dates?

I am using R and I have a data frame containing info about the applications made by individuals for a grant. Individuals can apply for a grant as many times as they like. I want to derive a new variable that tells me how many applications each individual has made up to and including the date of the application represented by each record.

At the moment my data looks like this:

app number  date app made     applicant
1           2012-08-01        John
2           2012-08-02        John
3           2012-08-02        Jane
4           2012-08-04        John
5           2012-08-08        Alice
6           2012-08-09        Alice
7           2012-08-09        Jane

And I would like to add a further variable so my data frame looks like this:

app number  date app made    applicant  applications by applicant to date
1           2012-08-01       John       1
2           2012-08-02       John       2
3           2012-08-02       Jane       1
4           2012-08-04       John       3
5           2012-08-08       Alice      1
6           2012-08-09       Alice      2
7           2012-08-09       Jane       2

I'm new to R and I'm really struggling to work out how to do this. The closest I am able to get is something like the answer in this question: How do I count the number of observations at given intervals in R?

But I can't work out how to do this based on the date in each record rather than on pre-set intervals.

Upvotes: 4

Views: 366

Answers (3)

Greg Snow
Greg Snow

Reputation: 49650

Here is a 1 line approach using the ave function. This version does not require reordering the data, but leaves the data in the same order as it was originally:

A$applications <- ave(A$app.number, A$applicant, FUN=seq_along)

Upvotes: 5

Justin
Justin

Reputation: 43265

You can use plyr for this. If your data is in a data.frame dat, I would add a column called count, then use cumsum

library(plyr)
dat <- structure(list(number = 1:7, date = c("2012-08-01", "2012-08-02", 
"2012-08-02", "2012-08-04", "2012-08-08", "2012-08-09", "2012-08-09"
), name = c("John", "John", "Jane", "John", "Alice", "Alice", 
"Jane")), .Names = c("number", "date", "name"), row.names = c(NA, 
-7L), class = "data.frame")

dat$count <- 1

ddply(dat, .(name), transform, count=cumsum(count))

  number       date  name count
1      5 2012-08-08 Alice     1
2      6 2012-08-09 Alice     2
3      3 2012-08-02  Jane     1
4      7 2012-08-09  Jane     2
5      1 2012-08-01  John     1
6      2 2012-08-02  John     2
7      4 2012-08-04  John     3
> 

I assumed your dates were already sorted, however you might want to explicitly sort them anyway before you do your "counting":

dat <- dat[order(dat$date),]

as per the comment, this can be simplified if you understand (which I didn't!) the way transform is working:

ddply(dat, .(name), transform, count=order(date))
  number       date  name count
1      5 2012-08-08 Alice     1
2      6 2012-08-09 Alice     2
3      3 2012-08-02  Jane     1
4      7 2012-08-09  Jane     2
5      1 2012-08-01  John     1
6      2 2012-08-02  John     2
7      4 2012-08-04  John     3

Upvotes: 5

tim riffe
tim riffe

Reputation: 5691

Here's a less elegant way than @Justin 's:

    A <- read.table(text='"app number"  "date app made"     "applicant"
    1           2012-08-01        John
    2           2012-08-02        John
    3           2012-08-02        Jane
    4           2012-08-04        John
    5           2012-08-08        Alice
    6           2012-08-09        Alice
    7           2012-08-09        Jane',header=TRUE)

    # order by applicant name
    A <- A[order(A$applicant), ]
    # get vector you're looking for
    A$app2date <- unlist(sapply(unique(A$applicant),function(x, appl){
                         seq(sum(A$applicant == x))
                       }, appl = A$applicant)
                     )
    # back in original order:
    A   <- A[order(A$"app.number"), ]

Upvotes: 5

Related Questions