Reputation: 631
Suppose I have the following data table:
dt <- data.table(id = c(1, 1, 2, 2, 3, 3, 4, 4), v = 1:8)
id v
1: 1 1
2: 1 2
3: 2 3
4: 2 4
5: 3 5
6: 3 6
7: 4 7
8: 4 8
When I drop the rows with id = 2, I get the following:
dt <- dt[id != 2]
id v
1: 1 1
2: 1 2
3: 3 5
4: 3 6
5: 4 7
6: 4 8
At this point, how could I shift the values in id
so that there is no gap in the integer values in id
, like the following?
id v
1: 1 1
2: 1 2
3: 2 5
4: 2 6
5: 3 7
6: 3 8
Upvotes: 2
Views: 49
Reputation: 11255
Here's another approach that should be fast. Note: all of these methods assume the ID
column is sorted.
library(data.table)
dt <- data.table(id = c(1, 1, 2, 2, 3, 3, 4, 4), v = 1:8)
dt <- dt[id != 2]
dt[, id := cumsum(c(1L, diff(id) != 0L))]
dt
And a couple of more:
###rleid
dt <- data.table(id = c(1, 1, 2, 2, 3, 3, 4, 4), v = 1:8)
dt <- dt[id != 2]
dt[, id := rleid(id)]
###Rcpp
Rcpp::cppFunction(
"
IntegerVector re_index(IntegerVector x){
IntegerVector out(x.size());
int ind = 1;
out(0) = ind;
for (int i = 1; i < x.size(); i++) {
if (x(i) != x(i-1)) ind++;
out(i) = ind;
}
return(out);
}"
)
dt <- data.table(id = c(1, 1, 2, 2, 3, 3, 4, 4), v = 1:8)
dt <- dt[id != 2]
dt[, id := re_index(id)]
Upvotes: 2
Reputation: 389155
Here are a couple of options :
1) Using match
library(data.table)
dt[, id := match(id, unique(id))]
dt
# id v
#1: 1 1
#2: 1 2
#3: 2 5
#4: 2 6
#5: 3 7
#6: 3 8
2) With factor
dt[, id := as.integer(factor(id))]
EDIT: The above two methods by Ronak Shah are faster than the method in the accepted answer:
microbenchmark(
match = {
dt <- data.table(id = c(1, 1, 2, 2, 3, 3, 4, 4), v = 1:8)
dt <- dt[id != 2]
dt[, id := match(id, unique(id))]},
factor = {
dt <- data.table(id = c(1, 1, 2, 2, 3, 3, 4, 4), v = 1:8)
dt <- dt[id != 2]
dt[, id := as.integer(factor(id))]},
subtract = {
dt <- data.table(id = c(1, 1, 2, 2, 3, 3, 4, 4), v = 1:8)
dt <- dt[id != 2]
dt[get('id')>2, c('id') := get('id')-1]}
)
Unit: microseconds
expr min lq mean median uq max neval cld
match 510.401 538.3515 559.029 556.6010 578.0505 625.000 100 a
factor 564.402 585.0515 608.497 604.0015 622.2505 831.301 100 b
subtract 664.201 693.5510 724.059 716.6005 743.5515 1317.701 100 c
Upvotes: 3
Reputation: 6226
You can update by reference rows by a combination of a logical condition( id>2
) and :=
operator:
df[get('id')>2, c('id') := get('id')-1]
In English terms: on rows with id>2, replace id by id-1.
By the way, I put column names into string because if latter you want to generalize this function using id<-2, you will need to do df[get('id')>id, c('id') := get('id')-1]
. In your case it's extra formalism but you might find that useful later
Upvotes: 2