flodel
flodel

Reputation: 89067

`as.na` function

In R, almost every is.* function I can think of has a corresponding as.*. There is a is.na but no as.na. Why not and how would you implement one if such function makes sense?

I have a vector x that can be logical, character, integer, numeric or complex and I want to convert it to a vector of same class and length, but filled with the appropriate: NA, NA_character_, NA_integer_, NA_real_, or NA_complex_.

My current version:

as.na <- function(x) {x[] <- NA; x}

Upvotes: 20

Views: 3821

Answers (4)

JAllen
JAllen

Reputation: 638

Old question but, how about

as.na <- function(obj){
  if(is.factor(obj)){
    # Special case for factors - any others that need to be handled?
    factor(rep(NA, length(obj)), levels = levels(obj))
  } else{
    objClass <- class(obj)
    x <- rep(NA, length(obj))
    class(x) <- objClass 
    x
  }
}

For dataframes:

DF <- data.frame(
  int = seq(1, 10),
  real = seq(1, 10) + 0.1,
  char = letters[1:10],
  logi = rep(c(TRUE, FALSE), 5),
  Date = seq.Date(as.Date("2019-09-03"), by = 1, length.out = 10),
  posix = seq.POSIXt(as.POSIXct("2019-09-03 12:00:00"), by = 360, length.out = 10),
  stringsAsFactors = FALSE
) 
DF$factr <- as.factor(LETTERS[1:10])
str(DF)
'data.frame':   10 obs. of  7 variables:
 $ int  : int  1 2 3 4 5 6 7 8 9 10
 $ real : num  1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1
 $ char : chr  "a" "b" "c" "d" ...
 $ logi : logi  TRUE FALSE TRUE FALSE TRUE FALSE ...
 $ Date : Date, format: "2019-09-03" "2019-09-04" "2019-09-05" ...
 $ posix: POSIXct, format: "2019-09-03 12:00:00" "2019-09-03 12:06:00" "2019-09-03 12:12:00" ...
 $ factr: Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10

DF_na <- DF
for(i in colnames(DF_na)){
  DF_na[,i] <- as.na(DF_na[,i])
}

str(DF_na)
'data.frame':   10 obs. of  7 variables:
 $ int  : int  NA NA NA NA NA NA NA NA NA NA
 $ real : num  NA NA NA NA NA NA NA NA NA NA
 $ char : chr  NA NA NA NA ...
 $ logi : logi  NA NA NA NA NA NA ...
 $ Date : Date, format: NA NA NA ...
 $ posix: POSIXct, format: NA NA NA ...
 $ factr: Factor w/ 10 levels "A","B","C","D",..: NA NA NA NA NA NA NA NA NA NA

For data.tables:

library(data.table)

DT <- data.table::data.table(
  int = seq(1, 10),
  real = seq(1, 10) + 0.1,
  char = letters[1:10],
  logi = rep(c(TRUE, FALSE), 5),
  Date = seq.Date(as.Date("2019-09-03"), by = 1, length.out = 10),
  posix = seq.POSIXt(as.POSIXct("2019-09-03 12:00:00"), by = 360, length.out = 10),
  factr = as.factor(LETTERS[1:10])
)
str(DT)
Classes ‘data.table’ and 'data.frame':  10 obs. of  7 variables:
 $ int  : int  1 2 3 4 5 6 7 8 9 10
 $ real : num  1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1
 $ char : chr  "a" "b" "c" "d" ...
 $ logi : logi  TRUE FALSE TRUE FALSE TRUE FALSE ...
 $ Date : Date, format: "2019-09-03" "2019-09-04" "2019-09-05" ...
 $ posix: POSIXct, format: "2019-09-03 12:00:00" "2019-09-03 12:06:00" "2019-09-03 12:12:00" ...
 $ factr: Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10
 - attr(*, ".internal.selfref")=<externalptr> 

DT_na <-
  copy(DT)[, lapply(.SD, as.na)]
str(DT_na)
Classes ‘data.table’ and 'data.frame':  10 obs. of  7 variables:
 $ int  : int  NA NA NA NA NA NA NA NA NA NA
 $ real : num  NA NA NA NA NA NA NA NA NA NA
 $ char : chr  NA NA NA NA ...
 $ logi : logi  NA NA NA NA NA NA ...
 $ Date : Date, format: NA NA NA ...
 $ posix: POSIXct, format: NA NA NA ...
 $ factr: Factor w/ 10 levels "A","B","C","D",..: NA NA NA NA NA NA NA NA NA NA
 - attr(*, ".internal.selfref")=<externalptr> 

Upvotes: 0

John
John

Reputation: 23758

The function does not exist because it's not a type conversion. A type conversion would be changing 1L to 1.0, or changing "1" to 1L. The NA type isn't a conversion from another type unless that type was text. Given that there's only one type you could possibly convert from and there are so many options for doing an assignment of NA (as in the many other answers) there's no need for such a function.

Every one of the answers you've gotten would just assign NA to everything passed in to it but you'd probably only want to do it conditionally. Doing the assignment conditionally or calling a small wrapper would be no different.

Upvotes: 11

Josh O&#39;Brien
Josh O&#39;Brien

Reputation: 162341

This seems to be consistently faster than your function:

as.na <- function(x) {
    rep(c(x[0], NA), length(x))
}

(Thanks to Joshua Ulrich for pointing out that my earlier version didn't preserve class attributes.)


Here, for the record, are some relative timings:

library(rbenchmark)

## The functions
flodel <- function(x) {x[] <- NA; x}
joshU <- function(x) {is.na(x) <- seq_along(x); x}
joshO <- function(x) rep(c(x[0], NA), length(x))

## Some vectors to  test them on
int  <- 1:1e6
char <- rep(letters[1:10], 1e5)
bool <- rep(c(TRUE, FALSE), 5e5)

benchmark(replications=100, order="relative",
    flodel_bool = flodel(bool),
    flodel_int  = flodel(int),
    flodel_char = flodel(char),
    joshU_bool = joshU(bool),
    joshU_int  = joshU(int),
    joshU_char = joshU(char),
    joshO_bool = joshO(bool),
    joshO_int  = joshO(int),
    joshO_char = joshO(char))[1:6]        
#          test replications elapsed relative user.self sys.self
# 7  joshO_bool          100    0.46    1.000      0.33     0.14
# 8   joshO_int          100    0.49    1.065      0.31     0.18
# 9  joshO_char          100    1.13    2.457      0.97     0.16
# 1 flodel_bool          100    2.31    5.022      2.01     0.30
# 2  flodel_int          100    2.31    5.022      2.00     0.31
# 3 flodel_char          100    2.64    5.739      2.36     0.28
# 4  joshU_bool          100    3.78    8.217      3.13     0.66
# 5   joshU_int          100    3.95    8.587      3.30     0.64
# 6  joshU_char          100    4.22    9.174      3.70     0.51

Upvotes: 13

Joshua Ulrich
Joshua Ulrich

Reputation: 176668

Why not use is.na<- as directed in ?is.na?

> l <- list(integer(10), numeric(10), character(10), logical(10), complex(10))
> str(lapply(l, function(x) {is.na(x) <- seq_along(x); x}))
List of 5
 $ : int [1:10] NA NA NA NA NA NA NA NA NA NA
 $ : num [1:10] NA NA NA NA NA NA NA NA NA NA
 $ : chr [1:10] NA NA NA NA ...
 $ : logi [1:10] NA NA NA NA NA NA ...
 $ : cplx [1:10] NA NA NA ...

Upvotes: 15

Related Questions