Reputation: 1221
One normal way to fill in NA values in a data frame, loan, is as follows:
for (i in 1: ncol(loan))
{
if (is.character(loan[,i]))
{
loan[is.na(loan[ ,i]), i] <- "missing"
}
if (is.numeric(loan[,i]))
{
loan[is.na(loan[ ,i]), i] <- 9999
}
}
But if the loan data-set is a tibble, the above method does not work as is.character(loan[,i]) is always FALSE and also is.numeric(loan[,i]) is also FALSE. Dataset loan's class is as below:
> class(loan)
[1] "tbl_df" "tbl" "data.frame"
To use the above for-loop for filing in missing values, I have to first convert 'loan' to a data frame with as.data.frame() and then use the for-loop.
Is it possible to directly manipulate a tibble without first converting it to a data.frame to fill in missing values?
Upvotes: 2
Views: 1496
Reputation: 887118
We can use the tidyverse
syntax to do this
library(tidyverse)
loan %>%
mutate_if(is.character, funs(replace(., is.na(.), "missing"))) %>%
mutate_if(is.numeric, funs(replace(., is.na(.), 9999)))
# A tibble: 20 × 3
# Col1 Col2 Col3
# <chr> <dbl> <chr>
#1 a 9999 A
#2 a 2 A
#3 d 3 A
#4 c 9999 missing
#5 c 1 missing
#6 e 3 missing
#7 a 9999 A
#8 d 2 A
#9 d 3 A
#10 a 9999 A
#11 c 1 A
#12 b 1 C
#13 d 1 A
#14 d 9999 B
#15 a 4 B
#16 e 1 C
#17 a 3 A
#18 missing 3 A
#19 c 3 missing
#20 missing 4 missing
As the dataset is a tibble
, it will not get converted to vector
by extracting with [
, instead we need [[
for (i in 1: ncol(loan)) {
if (is.character(loan[[i]])) {
loan[is.na(loan[[i]]), i] <- "missing"
} if (is.numeric(loan[[i]])) {
loan[is.na(loan[[i]]), i] <- 9999
}
}
To understand the problem, we just need to look at the output of the extraction
head(is.na(loan[,1]))
# Col1
#[1,] FALSE
#[2,] FALSE
#[3,] FALSE
#[4,] FALSE
#[5,] FALSE
#[6,] FALSE
head(is.na(loan[[1]]))
#[1] FALSE FALSE FALSE FALSE FALSE FALSE
In the for
loop, we are using the rowindex as a logical matrix
with 1 column in the first case, and the second case it is a vector
which makes the difference
set.seed(24)
loan <- as_tibble(data.frame(Col1 = sample(c(NA, letters[1:5]), 20,
replace = TRUE), Col2 = sample(c(NA, 1:4), 20, replace = TRUE),
Col3 = sample(c(NA, LETTERS[1:3]), 20, replace = TRUE),
stringsAsFactors=FALSE))
Upvotes: 2