Reputation: 1595
I've been successful most of the time but in one instance the following code, throws error.
Error: character string is not in a standard unambiguous format
current[is.na(current)] = ""
The following works. But how do I avoid writing 3 times?
isnaColumns <- sapply(current, is.character)
current[,isnaColumns] <- lapply(current[,isnaColumns], function(z) replace(z, is.na(z), ""))
isnaColumns <- sapply(current, is.numeric)
current[,isnaColumns] <- lapply(current[,isnaColumns], function(z) replace(z, is.na(z), "" ))
isnaColumns <- sapply(current, is.logical)
current[,isnaColumns] <- lapply(current[,isnaColumns], function(z) replace(z, is.na(z), "" ))
Upvotes: 0
Views: 156
Reputation: 160817
I think an even better approach is to only update columns that make sense to update, as in character
and possibly factor
. The former is as simple as
ischr <- sapply(current, is.character)
current[,ischr] <- lapply(current[,ischr], function(z) replace(z, is.na(z), ""))
(Apologies for the previous code that was exploding incorrectly ...)
Testing with large-ish data:
n <- 1e7 # 10,000,000
set.seed(42) # R-4.0.2
current <- data.frame(
int=sample(1000, size=n, replace=TRUE),
chr1=sample(letters, size=n, replace=TRUE),
chr2=sample(LETTERS, size=n, replace=TRUE),
chr3=sample(letters, size=n, replace=TRUE),
chr4=sample(LETTERS, size=n, replace=TRUE),
chr5=sample(letters, size=n, replace=TRUE),
chr6=sample(LETTERS, size=n, replace=TRUE)
)
ischr <- sapply(current, is.character)
ischr
# int chr1 chr2 chr3 chr4 chr5 chr6
# FALSE TRUE TRUE TRUE TRUE TRUE TRUE
current[,ischr] <- lapply(current[,ischr], function(z) replace(z, sample(n, size=n/10), NA))
head(current)
# int chr1 chr2 chr3 chr4 chr5 chr6
# 1 561 y Z m D y P
# 2 997 l D q <NA> c C
# 3 321 z Q a E <NA> H
# 4 153 n K <NA> C h P
# 5 74 <NA> I t S y N
# 6 228 e C s Z q L
system.time({
current[,ischr] <- lapply(current[,ischr], function(z) replace(z, is.na(z), ""))
})
# user system elapsed
# 0.39 0.06 0.45
head(current)
# int chr1 chr2 chr3 chr4 chr5 chr6
# 1 561 y Z m D y P
# 2 997 l D q c C
# 3 321 z Q a E H
# 4 153 n K C h P
# 5 74 I t S y N
# 6 228 e C s Z q L
Upvotes: 2