Reputation: 89067
In R, almost every is.*
function I can think of has a corresponding as.*
. There is a is.na
but no as.na
. Why not and how would you implement one if such function makes sense?
I have a vector x
that can be logical
, character
, integer
, numeric
or complex
and I want to convert it to a vector of same class and length, but filled with the appropriate: NA
, NA_character_
, NA_integer_
, NA_real_
, or NA_complex_
.
My current version:
as.na <- function(x) {x[] <- NA; x}
Upvotes: 20
Views: 3821
Reputation: 638
Old question but, how about
as.na <- function(obj){
if(is.factor(obj)){
# Special case for factors - any others that need to be handled?
factor(rep(NA, length(obj)), levels = levels(obj))
} else{
objClass <- class(obj)
x <- rep(NA, length(obj))
class(x) <- objClass
x
}
}
For dataframes:
DF <- data.frame(
int = seq(1, 10),
real = seq(1, 10) + 0.1,
char = letters[1:10],
logi = rep(c(TRUE, FALSE), 5),
Date = seq.Date(as.Date("2019-09-03"), by = 1, length.out = 10),
posix = seq.POSIXt(as.POSIXct("2019-09-03 12:00:00"), by = 360, length.out = 10),
stringsAsFactors = FALSE
)
DF$factr <- as.factor(LETTERS[1:10])
str(DF)
'data.frame': 10 obs. of 7 variables:
$ int : int 1 2 3 4 5 6 7 8 9 10
$ real : num 1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1
$ char : chr "a" "b" "c" "d" ...
$ logi : logi TRUE FALSE TRUE FALSE TRUE FALSE ...
$ Date : Date, format: "2019-09-03" "2019-09-04" "2019-09-05" ...
$ posix: POSIXct, format: "2019-09-03 12:00:00" "2019-09-03 12:06:00" "2019-09-03 12:12:00" ...
$ factr: Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10
DF_na <- DF
for(i in colnames(DF_na)){
DF_na[,i] <- as.na(DF_na[,i])
}
str(DF_na)
'data.frame': 10 obs. of 7 variables:
$ int : int NA NA NA NA NA NA NA NA NA NA
$ real : num NA NA NA NA NA NA NA NA NA NA
$ char : chr NA NA NA NA ...
$ logi : logi NA NA NA NA NA NA ...
$ Date : Date, format: NA NA NA ...
$ posix: POSIXct, format: NA NA NA ...
$ factr: Factor w/ 10 levels "A","B","C","D",..: NA NA NA NA NA NA NA NA NA NA
For data.tables:
library(data.table)
DT <- data.table::data.table(
int = seq(1, 10),
real = seq(1, 10) + 0.1,
char = letters[1:10],
logi = rep(c(TRUE, FALSE), 5),
Date = seq.Date(as.Date("2019-09-03"), by = 1, length.out = 10),
posix = seq.POSIXt(as.POSIXct("2019-09-03 12:00:00"), by = 360, length.out = 10),
factr = as.factor(LETTERS[1:10])
)
str(DT)
Classes ‘data.table’ and 'data.frame': 10 obs. of 7 variables:
$ int : int 1 2 3 4 5 6 7 8 9 10
$ real : num 1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1
$ char : chr "a" "b" "c" "d" ...
$ logi : logi TRUE FALSE TRUE FALSE TRUE FALSE ...
$ Date : Date, format: "2019-09-03" "2019-09-04" "2019-09-05" ...
$ posix: POSIXct, format: "2019-09-03 12:00:00" "2019-09-03 12:06:00" "2019-09-03 12:12:00" ...
$ factr: Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10
- attr(*, ".internal.selfref")=<externalptr>
DT_na <-
copy(DT)[, lapply(.SD, as.na)]
str(DT_na)
Classes ‘data.table’ and 'data.frame': 10 obs. of 7 variables:
$ int : int NA NA NA NA NA NA NA NA NA NA
$ real : num NA NA NA NA NA NA NA NA NA NA
$ char : chr NA NA NA NA ...
$ logi : logi NA NA NA NA NA NA ...
$ Date : Date, format: NA NA NA ...
$ posix: POSIXct, format: NA NA NA ...
$ factr: Factor w/ 10 levels "A","B","C","D",..: NA NA NA NA NA NA NA NA NA NA
- attr(*, ".internal.selfref")=<externalptr>
Upvotes: 0
Reputation: 23758
The function does not exist because it's not a type conversion. A type conversion would be changing 1L to 1.0, or changing "1" to 1L. The NA type isn't a conversion from another type unless that type was text. Given that there's only one type you could possibly convert from and there are so many options for doing an assignment of NA (as in the many other answers) there's no need for such a function.
Every one of the answers you've gotten would just assign NA to everything passed in to it but you'd probably only want to do it conditionally. Doing the assignment conditionally or calling a small wrapper would be no different.
Upvotes: 11
Reputation: 162341
This seems to be consistently faster than your function:
as.na <- function(x) {
rep(c(x[0], NA), length(x))
}
(Thanks to Joshua Ulrich for pointing out that my earlier version didn't preserve class attributes.)
Here, for the record, are some relative timings:
library(rbenchmark)
## The functions
flodel <- function(x) {x[] <- NA; x}
joshU <- function(x) {is.na(x) <- seq_along(x); x}
joshO <- function(x) rep(c(x[0], NA), length(x))
## Some vectors to test them on
int <- 1:1e6
char <- rep(letters[1:10], 1e5)
bool <- rep(c(TRUE, FALSE), 5e5)
benchmark(replications=100, order="relative",
flodel_bool = flodel(bool),
flodel_int = flodel(int),
flodel_char = flodel(char),
joshU_bool = joshU(bool),
joshU_int = joshU(int),
joshU_char = joshU(char),
joshO_bool = joshO(bool),
joshO_int = joshO(int),
joshO_char = joshO(char))[1:6]
# test replications elapsed relative user.self sys.self
# 7 joshO_bool 100 0.46 1.000 0.33 0.14
# 8 joshO_int 100 0.49 1.065 0.31 0.18
# 9 joshO_char 100 1.13 2.457 0.97 0.16
# 1 flodel_bool 100 2.31 5.022 2.01 0.30
# 2 flodel_int 100 2.31 5.022 2.00 0.31
# 3 flodel_char 100 2.64 5.739 2.36 0.28
# 4 joshU_bool 100 3.78 8.217 3.13 0.66
# 5 joshU_int 100 3.95 8.587 3.30 0.64
# 6 joshU_char 100 4.22 9.174 3.70 0.51
Upvotes: 13
Reputation: 176668
Why not use is.na<-
as directed in ?is.na
?
> l <- list(integer(10), numeric(10), character(10), logical(10), complex(10))
> str(lapply(l, function(x) {is.na(x) <- seq_along(x); x}))
List of 5
$ : int [1:10] NA NA NA NA NA NA NA NA NA NA
$ : num [1:10] NA NA NA NA NA NA NA NA NA NA
$ : chr [1:10] NA NA NA NA ...
$ : logi [1:10] NA NA NA NA NA NA ...
$ : cplx [1:10] NA NA NA ...
Upvotes: 15