Reputation: 1237
I have a dataset that looks likes this:
Data01 <- data.table(
code=c("A111", "A111","A111","A111","A111", "A111","A111","A234", "A234","A234","A234","A234", "A234","A234"),
x=c("",126,126,"",836,843,843,126,126,"",127,836,843,843),
y=c("",76,76,"",456,465,465,76,76,"",77,456,465,465),
no1=c(028756, 028756,028756,057756, 057756, 057756, 057756,028756, 028756,057756,057756, 057756, 057756, 057756),
no2=c("","",034756,"","","",789165,"",034756,"","","","",789165)
)
Data01[, version := paste0("V", 1:.N), by = code]
Data01[, unique_version := paste(code, version, sep = "_")]
What I would like is a way of adding a column that for each unique code
entry says what the difference between each row and the previous one is (i.e. pasting the column name(s) where there are now different values). Something like this:
Data01[, change := c("First_entry","New_x_and_y","New_no2","New_x_and_y_and_no_1","New_x_and_y","New_x_and_y","New_no2","First_entry","New_no2","New_x_and_y_and_no1","New_x_and_y","New_x_and_y","New_x_and_y","New_no2")]
My actual dataset has 5.5 million rows and around 2.6 million unique code
entries so I would imagine any solution to this would take some time to complete. It would therefore really helpful to include some sort of progress indicator (like something suggested here: Progress bar in data.table aggregate action) if possible.
Upvotes: 1
Views: 159
Reputation: 25225
You can try something like this
nm <- c("x","y","no1","no2") #names(Data01)[-1L]
Data01[, change := c("First_entry",
sapply(seq_len(.N)[-1L], function(n) {
paste(c("New",
nm[which(unlist(.SD[n-1L]) != unlist(.SD[n]))]),
collapse="_")
})),
by=.(code)]
Upvotes: 1