Insert new row based on condition

Question

I would like to insert efficiently (as I have 1mil. row df) new row if given id occurs only once.

Here sample data and desired result:

df <- data.frame(id=c(1,1,2,6,6,6,101),year=c(2012,2013,2014,2011,2012,2013,2015),value1=rnorm(7),value2=rexp(7),value3=rpois(7,2))


df <- transform(df, rec=ave(df$id, df$id, FUN=seq_along))

dfL <- df[!duplicated(df$id, fromLast=TRUE), ]
dfL[dfL$rec==1, ][-ncol(dfL)]

The desired result for the above already selected 2 rows:

id year    value1    value2 value3
2  2013      ""        ""     ""
2  2014   0.7345321 0.1449742  3
101 2014   ""          ""     ""
101 2015  0.2547620 0.7166841  1

For the whole data set it would be:

id  year  value1     value2      value3 
1   2012 -1.38007779 2.97957329      2  
1   2013 -0.03257949 0.92996719      1   
2   2013  ""         ""              ""
2   2014  0.73453214 0.14497416      3   
6   2011 -1.04242748 0.03099297      0   
6   2012  0.35932001 0.90399050      3   
6   2013  1.18300395 0.29446006      1   
101 2014  ""        ""              ""
101 2015  0.25476203 0.71668415      1

David Arenburg · Accepted Answer

You could try something as the following

library(data.table); 
setDT(df)[, if(.N < 2L) {
               .SD[c(NA_real_, 1L)][, year := c(year[2L] - 1L, year[2L])]
          } else .SD,
          by = id]
#     id year     value1     value2 value3
# 1:   1 2012 -1.6552064 0.05040943      4
# 2:   1 2013  1.3478321 1.08314748      2
# 3:   2 2013         NA         NA     NA
# 4:   2 2014  0.2320925 0.45546268      0
# 5:   6 2011  0.5110803 0.07412734      2
# 6:   6 2012  0.5170348 0.29052181      0
# 7:   6 2013  0.6925863 0.09309213      3
# 8: 101 2014         NA         NA     NA
# 9: 101 2015  1.6719612 0.84413431      4

This is basically checks per group if it's larger than 1 (.N denote the size of the group), if so it returns it as is (.SD denotes the SubData which is the data within a specific group), if not, it replicates the whole row twice (subsets .SD by its first value and some nonexistent NA value) for that id and fills the missing year (I'm assigning it to column year using the := operator while subsetting from the non-NA year (year[2] - 1L) one year an adding it with that year itself(year[2]))

Insert new row based on condition

Answers (2)

Related Questions