Reputation: 427
This seems like something that should be easy but I can't figure it out.
>d=data.table(x=1:5,y=11:15,z=letters[1:5])
>d
x y z
1: 1 11 a
2: 2 12 b
3: 3 13 c
4: 4 14 d
5: 5 15 e
Now, I have decided that row 3 is bad data. I want all of those set to NA.
d[3,]<-NA
Warning message: In
[<-.data.table
(*tmp*
, 3, , value = NA) : Coerced 'logical' RHS to 'character' to match the column's type. Either change the target column to 'logical' first (by creating a new 'logical' vector length 5 (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to 'character' (e.g. 1L, NA_[real|integer]_, as.*, etc) to make your intent clear and for speed. Or, set the column type correctly up front when you create the table and stick to it, please.
Yet, it seems to work.
> d
x y z
1: 1 11 a
2: 2 12 b
3: NA NA NA
4: 4 14 d
5: 5 15 e
If I convert to data.frame, it also works but without the warning. But then I need to convert back which seems awkward. Is there a better way?
Upvotes: 5
Views: 4916
Reputation: 193507
What about using ?set
?
> d=data.table(x=1:5,y=11:15,z=letters[1:5])
> set(d, 3L, 1:3, NA_character_)
> d
x y z
1: 1 11 a
2: 2 12 b
3: NA NA NA
4: 4 14 d
5: 5 15 e
> str(d)
Classes ‘data.table’ and 'data.frame': 5 obs. of 3 variables:
$ x: int 1 2 NA 4 5
$ y: int 11 12 NA 14 15
$ z: chr "a" "b" NA "d" ...
- attr(*, ".internal.selfref")=<externalptr>
Or, simply:
> d=data.table(x=1:5,y=11:15,z=letters[1:5])
> d[3] <- NA_character_
> str(d)
Classes ‘data.table’ and 'data.frame': 5 obs. of 3 variables:
$ x: int 1 2 NA 4 5
$ y: int 11 12 NA 14 15
$ z: chr "a" "b" NA "d" ...
- attr(*, ".internal.selfref")=<externalptr>
[ From Matthew ] Yes either set()
is the way to go, or @mnel's answer is very neat :
DT[rownum, names(DT) := .SD[NA]]
On the presence or not of the coerce warning in the set
approach, here's the internal code (modified here to convey the salient points). I seem to have had loss of precision (from double
to integer
) in mind when writing that, as well as inefficiency of coercing the RHS.
if( (isReal(RHS) && (TYPEOF(targetcol)==INTSXP || isLogical(targetcol))) ||
(TYPEOF(RHS)==INTSXP && isLogical(targetcol)) ||
(isString(targetcol))) {
if (isReal(RHS)) s3="; may have truncated precision"; else s3="";
warning("Coerced '%s' RHS to '%s' to match the column's type%s. ... <s3> ...
}
The full source of assign.c can be inpected here :
https://r-forge.r-project.org/scm/viewvc.php/pkg/src/assign.c?view=markup&root=datatable
There is a very similar feature request to improve this :
FR#2551 Singleton := RHS no coerce warning if no precision lost
Have added a link there back to this question.
In general where data.table
is over cautious in warning you about potential problems or inefficiencies, in a case like this where you want to set a set of column of different types, wrapping with suppressWarnings()
is another way.
Upvotes: 3
Reputation: 115382
To set by reference.
DT[rownum, (names(DT)) := lapply(.SD, function(x) { .x <- x[1]; is.na(.x) <- 1L; .x})]
Or perhaps
DT[rownum, (names(DT)) := lapply(.SD[1,], function(x) { is.na(x) <- 1L; x})]
This will ensure that the correct NA type is created (factor and dates as well)
The second case only indexes once, this may be slightly faster if there are lots of columns in DT or rownum creates a large subgroup of rows.
You could also do (a variant on Roland's solution, but with no copying.
DT[rownum, (names(DT)) := .SD[NA]]
Upvotes: 9
Reputation: 132596
Use the explicit NA
types:
d[3,] <- list(NA_integer_, NA_integer_, NA_character_)
Another possibility:
d[3,] <- d[3,lapply(.SD,function(x) x[NA])]
Upvotes: 7
Reputation: 427
Here is what I am doing now. Ok, I guess but still a little awkward.
na_datatable_row<-function(dtrow){
#take a row of data.table and return a row of the same table but
#with all values set tp NA
#DT[rownum,]<-NA works but throws an annoying warning
#so instead, do DT[rownum,]<-na_datatable_row(DT[anyrow,])
#this preserves the right types
row=data.frame(dtrow)
row[1,]<-NA
return(data.table(row))
}
Upvotes: 0