Peter Chen
Peter Chen

Reputation: 1484

Give order number by group

I know data.table package includes an argument .GRP to give order numbers by groups. However, if I do not want the order number starts from 1, how to do?

# dt
YEAR   NAME   ST
2010   AAA    AK
2011   AAA    AK
2012   AAA    AK
2010   BBB    AL
2011   BBB    AL
2010   CCC    CA

dt[, ID := .GRP, .(NAME, ST)]

YEAR   NAME   ST    ID
2010   AAA    AK     1 
2011   AAA    AK     1
2012   AAA    AK     1
2010   BBB    AL     2
2011   BBB    AL     2
2010   CCC    CA     3

Now, I explain clearly my question.
If I add new many new observations in it. How to add the ID number from previous.

YEAR   NAME   ST    ID
2010   AAA    AK     1 
2011   AAA    AK     1
2012   AAA    AK     1
2010   BBB    AL     2
2011   BBB    AL     2
2010   CCC    CA     3
2010   DDD    LA     
2011   DDD    LA    
2015   DDD    LA    
2016   DDD    LA    
2010   EEE    GA    
2011   EEE    GA    
2014   FFF    PA     


YEAR   NAME   ST    ID
2010   AAA    AK     1 
2011   AAA    AK     1
2012   AAA    AK     1
2010   BBB    AL     2
2011   BBB    AL     2
2010   CCC    CA     3
2010   DDD    LA     4
2011   DDD    LA     4
2015   DDD    LA     4
2016   DDD    LA     4
2010   EEE    GA     5
2011   EEE    GA     5
2014   FFF    PA     6

Upvotes: 1

Views: 81

Answers (1)

SymbolixAU
SymbolixAU

Reputation: 26258

My idea is to keep a reference to the maximum ID value already assigned, then use this to ADD to the new .GRP values. Here's an example

Setting up the data

library(data.table)

dt <- fread('YEAR   NAME   ST
2010   AAA    AK
2011   AAA    AK
2012   AAA    AK
2010   BBB    AL
2011   BBB    AL
2010   CCC    CA')

## Assign GRP IDs
dt[, ID := .GRP, .(NAME, ST)]

## New data
dt1 <- fread('YEAR   NAME   ST
2010   AAA    AK
2011   AAA    AK
2012   AAA    AK
2010   BBB    AL
2011   BBB    AL
2010   CCC    CA
2010   DDD    LA
2011   DDD    LA
2015   DDD    LA
2016   DDD    LA
2010   EEE    GA
2011   EEE    GA
2014   FFF    PA ')

Now we have two data.tables, the original dt, and a new one dt1. We can join them together to put the original ID values onto dt1.

## Attach the original ID values
dt1[
  dt
  , on = c("YEAR", "NAME", "ST")
  , ID := ID
]

Updating ID values

Then we can apply the same [ID := .GRP, .(NAME, ST)] on this new data where the ID value is NA, and add the maximum ID from the original data

## set the order so the NAs are at the bottom 
setorder(dt1, ST, NAME, YEAR, ID)

## keep a reference to the maximum Id already assigned, then add it to tne new groups
maxId <- max(dt1$ID, na.rm = T)
dt1[is.na(ID), ID := .GRP + maxId, .(NAME, ST) ]

dt1
#     YEAR NAME ST ID
# 1:  2010  AAA AK  1
# 2:  2011  AAA AK  1
# 3:  2012  AAA AK  1
# 4:  2010  BBB AL  2
# 5:  2011  BBB AL  2
# 6:  2010  CCC CA  3
# 7:  2010  EEE GA  4
# 8:  2011  EEE GA  4
# 9:  2010  DDD LA  5
# 10: 2011  DDD LA  5
# 11: 2015  DDD LA  5
# 12: 2016  DDD LA  5
# 13: 2014  FFF PA  6

Upvotes: 3

Related Questions