user1827975
user1827975

Reputation: 427

How do you replace a whole row of a data.table with NA?

This seems like something that should be easy but I can't figure it out.

>d=data.table(x=1:5,y=11:15,z=letters[1:5])
>d
   x  y z
1: 1 11 a
2: 2 12 b
3: 3 13 c
4: 4 14 d
5: 5 15 e

Now, I have decided that row 3 is bad data. I want all of those set to NA.

d[3,]<-NA

Warning message: In [<-.data.table(*tmp*, 3, , value = NA) : Coerced 'logical' RHS to 'character' to match the column's type. Either change the target column to 'logical' first (by creating a new 'logical' vector length 5 (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to 'character' (e.g. 1L, NA_[real|integer]_, as.*, etc) to make your intent clear and for speed. Or, set the column type correctly up front when you create the table and stick to it, please.

Yet, it seems to work.

> d
    x  y  z
1:  1 11  a
2:  2 12  b
3: NA NA NA
4:  4 14  d
5:  5 15  e

If I convert to data.frame, it also works but without the warning. But then I need to convert back which seems awkward. Is there a better way?

Upvotes: 5

Views: 4916

Answers (4)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193507

What about using ?set?

> d=data.table(x=1:5,y=11:15,z=letters[1:5])
> set(d, 3L, 1:3, NA_character_)
> d
    x  y  z
1:  1 11  a
2:  2 12  b
3: NA NA NA
4:  4 14  d
5:  5 15  e
> str(d)
Classes ‘data.table’ and 'data.frame':  5 obs. of  3 variables:
 $ x: int  1 2 NA 4 5
 $ y: int  11 12 NA 14 15
 $ z: chr  "a" "b" NA "d" ...
 - attr(*, ".internal.selfref")=<externalptr> 

Or, simply:

> d=data.table(x=1:5,y=11:15,z=letters[1:5])
> d[3] <- NA_character_
> str(d)
Classes ‘data.table’ and 'data.frame':  5 obs. of  3 variables:
 $ x: int  1 2 NA 4 5
 $ y: int  11 12 NA 14 15
 $ z: chr  "a" "b" NA "d" ...
 - attr(*, ".internal.selfref")=<externalptr> 

[ From Matthew ] Yes either set() is the way to go, or @mnel's answer is very neat :

DT[rownum, names(DT) := .SD[NA]]

On the presence or not of the coerce warning in the set approach, here's the internal code (modified here to convey the salient points). I seem to have had loss of precision (from double to integer) in mind when writing that, as well as inefficiency of coercing the RHS.

if( (isReal(RHS) && (TYPEOF(targetcol)==INTSXP || isLogical(targetcol))) ||
    (TYPEOF(RHS)==INTSXP && isLogical(targetcol)) ||
    (isString(targetcol))) {
    if (isReal(RHS)) s3="; may have truncated precision"; else s3="";
    warning("Coerced '%s' RHS to '%s' to match the column's type%s. ... <s3> ...
}

The full source of assign.c can be inpected here :
https://r-forge.r-project.org/scm/viewvc.php/pkg/src/assign.c?view=markup&root=datatable

There is a very similar feature request to improve this :

FR#2551 Singleton := RHS no coerce warning if no precision lost

Have added a link there back to this question.

In general where data.table is over cautious in warning you about potential problems or inefficiencies, in a case like this where you want to set a set of column of different types, wrapping with suppressWarnings() is another way.

Upvotes: 3

mnel
mnel

Reputation: 115382

To set by reference.

DT[rownum, (names(DT)) := lapply(.SD, function(x) {  .x <- x[1]; is.na(.x) <- 1L; .x})]

Or perhaps

DT[rownum, (names(DT)) := lapply(.SD[1,], function(x) { is.na(x) <- 1L; x})]

This will ensure that the correct NA type is created (factor and dates as well)

The second case only indexes once, this may be slightly faster if there are lots of columns in DT or rownum creates a large subgroup of rows.

You could also do (a variant on Roland's solution, but with no copying.

DT[rownum, (names(DT)) := .SD[NA]]

Upvotes: 9

Roland
Roland

Reputation: 132596

Use the explicit NA types:

d[3,] <- list(NA_integer_, NA_integer_, NA_character_)

Another possibility:

d[3,] <- d[3,lapply(.SD,function(x) x[NA])]

Upvotes: 7

user1827975
user1827975

Reputation: 427

Here is what I am doing now. Ok, I guess but still a little awkward.

na_datatable_row<-function(dtrow){
  #take a row of data.table and return a row of the same table but 
  #with all values set tp NA
  #DT[rownum,]<-NA works but throws an annoying warning 
  #so instead, do DT[rownum,]<-na_datatable_row(DT[anyrow,]) 
  #this preserves the right types
  row=data.frame(dtrow)
  row[1,]<-NA
  return(data.table(row))
}

Upvotes: 0

Related Questions