BSM
BSM

Reputation: 31

Replace NA in Factor type data in R

Data Frame X

The data frame X looks like this

State      code
New Jersey  1
New York    2
Califronia  NA

All columns are factors. I am looking to replace NA is with a text or 0. So that I can transpose them later.

When I try to run this command

X[is.na(X)] <- "0"

I get following errors

Warning messages:
1: In `[<-.factor`(`*tmp*`, thisvar, value = "0") :
  invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, thisvar, value = "0") :
  invalid factor level, NA generated
3: In `[<-.factor`(`*tmp*`, thisvar, value = "0") :
  invalid factor level, NA generated
4: In `[<-.factor`(`*tmp*`, thisvar, value = "0") :
  invalid factor level, NA generated

There is no change in NA values.

Upvotes: 2

Views: 9347

Answers (4)

Karim Kanatov
Karim Kanatov

Reputation: 91

let's create a random df with factor levels

df <- data.frame(a=sample(0:10, size=10, replace=TRUE),
                 b=sample(20:30, size=10, replace=TRUE))
df[df$a==0,'a'] <- NA
df$a <- as.factor(df$a)

other way to do is:

#check levels
levels(df$a)
#[1] "3"  "4"  "7"  "9"  "10"

#add new factor level. i.e 88 in our example
df$a = factor(df$a, levels=c(levels(df$a), 88))

#convert all NA's to 88
df$a[is.na(df$a)] = 88

#check levels again
levels(df$a)
#[1] "3"  "4"  "7"  "9"  "10" "88"

Upvotes: 0

coffeinjunky
coffeinjunky

Reputation: 11514

Another alternative using built-in factor:

df <- data.frame(a=letters[1:3], b=c("d", "e", NA))
df
  a    b
1 a    d
2 b    e
3 c <NA>

Now, recode the factor with factor:

df$b <- factor(df$b, exclude = NULL, 
               levels = c("d", "e", NA), 
               labels = c("d", "e", "f"))
df
  a b
1 a d
2 b e
3 c f

And for many factors, the following may be useful:

df[] <- lapply(df, function(x){
  # check if you have a factor first:
  if(!is.factor(x)) return(x)
  # otherwise include NAs into factor levels and change factor levels:
  x <- factor(x, exclude=NULL)
  levels(x)[is.na(levels(x))] <- "0"
  return(x)
  })

Upvotes: 4

JBGruber
JBGruber

Reputation: 12410

Simply:

X$code <- as.character(X$code) #as.numeric works just as good
X[is.na(X)] <- "0"
X$code <- as.factor(as.numeric(X$code))

In a loop over all columns it would look like this:

for (i in 2:ncol(X)) {
  X[,i] <- as.character(X[,i])
  X[which(is.na(X[,i])==TRUE),i] <- "0"
  X[,i] <- as.factor(as.numeric(X[,i]))
}

And for a character value like this:

for (i in 2:ncol(X)) {
  X[,i] <- as.character(X[,i])
  X[which(is.na(X[,i])==TRUE),i] <- "Not Assigned"
  X[,i] <- as.factor(X[,i])
}

Or if you prefer not to transform to character first, assign a new level to each column:

for (i in 2:ncol(X)) {
  levels(X[,i]) <- c(levels(X[,i]), "Not Assigned")
  X[which(is.na(X[,i])==TRUE),i] <- "Not Assigned"
}

Upvotes: 0

akaDrHouse
akaDrHouse

Reputation: 2240

The code you wrote will work for matrices, if you don't mind converting back and forth.

> X
       State code code2
1  NewJersey    1    NA
2    NewYork    2     0
3 Califronia   NA     4

> X<-as.matrix(X)
> X[is.na(X)] <- "0"
> X<-as.data.frame(X)
> X
       State code code2
1  NewJersey    1     0
2    NewYork    2     0
3 Califronia    0     4

> str(X)
'data.frame':   3 obs. of  3 variables:
 $ State: Factor w/ 3 levels "Califronia","NewJersey",..: 2 3 1
 $ code : Factor w/ 3 levels " 1"," 2","0": 1 2 3
 $ code2: Factor w/ 3 levels " 0"," 4","0": 3 1 2

Upvotes: 0

Related Questions