R Vectorized numbering of rows in a dataframe

Question

I have a large ordered dataframe consisting of a number of related records. For each group of related records, I need to number them from 1 to the total number of related records. If I iterate over the whole dataframe, the operation takes too long.

I'm wondering if there is a vectorized way to do this?

For example, if I had this dataframe:

ID  Month    State
1   Apr-2014  AL
2   May-2014  AL
3   Jun-2014  AL
4   Apr-2014  MN
5   May-2014  MN
6   Apr-2014  FL
7   May-2014  FL

I'd like to end up with:

ID  Month    State  Seq
1   Apr-2014  AL    1
2   May-2014  AL    2
3   Jun-2014  AL    3
4   Apr-2014  MN    1
5   May-2014  MN    2
6   Apr-2014  FL    1
7   May-2014  FL    2

akrun · Accepted Answer

Using the example dataset. If the dataset is ordered, you can compare the previous row of Month with the current row and check if they differ. Below code, I removed the first observation df$Month[-1] and compared with the ones with last observation removed df$Month[-nrow(df)] so that the lengths are equal. By using !=, we get TRUE for values that are different. Concatenate with TRUE at the beginning and do cumsum to get the index.

 df$Seq <- cumsum(c(TRUE,df$Month[-1]!= df$Month[-nrow(df)]))
 df
 #  ID    Month State Seq
 #1  1 Apr-2014    AL   1
 #2  2 Apr-2014    MN   1
 #3  3 Apr-2014    FL   1
 #4  4 May-2014    AL   2
 #5  5 May-2014    MN   2
 #6  6 May-2014    FL   2
 #7  7 Jun-2014    AL   3

Or you can convert the Month column to factor and reconvert it back to numeric.

 as.numeric(factor(df$Month, levels=unique(df$Month)))
 #[1] 1 1 1 2 2 2 3

Or using data.table

 library(data.table)
  DT <- setDT(df)[, Seq:= .GRP, by=Month]
  DT
  #   ID    Month State Seq
  #1:  1 Apr-2014    AL   1
  #2:  2 Apr-2014    MN   1
  #3:  3 Apr-2014    FL   1
  #4:  4 May-2014    AL   2
  #5:  5 May-2014    MN   2
  #6:  6 May-2014    FL   2
  #7:  7 Jun-2014    AL   3

.GRP is a special variable. Have a look at ?data.table to read more about it.

data

 df <-  structure(list(ID = 1:7, Month = c("Apr-2014", "Apr-2014", "Apr-2014", 
"May-2014", "May-2014", "May-2014", "Jun-2014"), State = c("AL", 
 "MN", "FL", "AL", "MN", "FL", "AL")), .Names = c("ID", "Month", 
 "State"), class = "data.frame", row.names = c(NA, -7L))

R Vectorized numbering of rows in a dataframe

Answers (2)

data

Related Questions