Reputation: 2194
I would like to set the first and the last value in a group to NA. Here is an example:
DT <- data.table(v = rnorm(12), class=rep(1:3, each=4))
DT[, v[c(1,.N)] := NA , by=class]
But this is not working. How can I do it?
Upvotes: 10
Views: 2557
Reputation: 7784
This may not be a one-liner, but it does have 'first' and 'last' in the code :)
> DT <- data.table(v = rnorm(12), class=rep(1:3, each=4))
> setkey(DT, class)
> classes = DT[, .(unique(class))]
> DT[classes, v := NA, mult='first']
> DT[classes, v := NA, mult='last']
> DT
v class
1: NA 1
2: -1.8191 1
3: -0.6355 1
4: NA 1
5: NA 2
6: -1.1771 2
7: -0.8125 2
8: NA 2
9: NA 3
10: 0.2357 3
11: 0.3416 3
12: NA 3
>
Order is also preserved for the non-key columns. I think that is a documented (committed to) feature.
Upvotes: 4
Reputation: 118779
At the moment, the way to go about this would be to first extract the indices, and then do one assignment by reference.
idx = DT[, .(idx = .I[c(1L, .N)]), by=class]$idx
DT[idx, v := NA]
I'll try and add this example to the Reference semantics vignette.
Upvotes: 12
Reputation: 22293
The canonical way to modify subsets of the data is to use i
to define the subset. You cannot use [
together with :=
. Either create a temporary i
as suggested by @David Arenburg or you can create the outcome vector yourself using a c(NA, v[-c(1, .N)], NA)
construction.
DT[, v := c(NA, v[-c(1, .N)], NA)[1:.N], by = class]
However, you should also note that the row order can change when you e.g. set a new key or use any number of functions. So you should be very careful with this operation.
Upvotes: 0
Reputation: 7282
With a helper function it's easy
set.na = function(x,y) {x[y] = NA; x}
DT[, set.na(v,c(1,.N)) , by=class]
Upvotes: 1