Ujjawal Bhandari
Ujjawal Bhandari

Reputation: 1372

Converting factor variables of character format into numeric

I am trying to convert factor variables into numeric. I have tried both these solutions -

as.numeric(levels(f))[f] 

as.numeric(as.character(f))

But the issue persists. Warning Message - NAs introduced by coercion

Reproducible example -

df = data.frame(x = c("10: Already Delinquent 90+",
                      "11: Credit History <6 Months",
                      "12: Current Balance = 0",
                      "13: Balance (2-6)=0",
                      "20: 1+ x 90+",
                      "30: 3+ x 60-89",
                      "31: 2 x 60-89",
                      "32: 1 x 60-89",
                      "40: 3+ x 30-59",
                      "41: 2 x 30-59",
                      "42: 1 x 30-59",
                      "50: Insufficient Performance",
                      "60: 3+ x 1-29",
                      "61: 2 x 1-29",
                      "62: 1 x 1-29",
                      "70: Never delinquent"),
                y = c("00:Bad",
                      "01:Ind",
                      "02:Good",
                      "NA",
                      "00:Bad",
                      "01:Ind",
                      "02:Good",
                      "NA",
                      "00:Bad",
                      "01:Ind",
                      "02:Good",
                      "NA",
                      "00:Bad",
                      "01:Ind",
                      "02:Good",
                      "NA"),
                z = ceiling(rnorm(16)))

#Select all the factor variables
factorvars = colnames(df)[which(sapply(df,is.factor))]

#Concatenate with "_Num"
xxx <- paste(factorvars, "_Num", sep="")

#Converting Factor to Numeric
for (i in 1:length(factorvars))
df[,xxx[i]] = NA
df[,xxx[i]] = as.numeric(levels(df[,factorvars[i]]) [df[,factorvars[i]]])

I want to retain factor variables and create new variables with conversion of levels to numeric. The desired output looks like below -

x   y   x_num   y_num
10: Already Delinquent 90+  00:Bad  1   1
11: Credit History <6 Months    01:Ind  2   2
12: Current Balance = 0 02:Good 3   3
13: Balance (2-6)=0 NA  4   NA
20: 1+ x 90+    00:Bad  5   1
30: 3+ x 60-89  01:Ind  6   2
31: 2 x 60-89   02:Good 7   3
32: 1 x 60-89   NA  8   NA
40: 3+ x 30-59  00:Bad  9   1
41: 2 x 30-59   01:Ind  10  2
42: 1 x 30-59   02:Good 11  3
50: Insufficient Performance    NA  12  NA
60: 3+ x 1-29   00:Bad  13  1
61: 2 x 1-29    01:Ind  14  2
62: 1 x 1-29    02:Good 15  3
70: Never delinquent    NA  16  NA

Upvotes: 0

Views: 260

Answers (1)

Pierre L
Pierre L

Reputation: 28441

Judging by your desired output, it doesn't look like you want to convert the factors to the numbers contained in their strings. Instead you want the internal representation of the factors.

Try this:

df[,xxx] <- lapply(df[,factorvars], as.numeric)
#                               x       y  z x_Num y_Num
# 1    10: Already Delinquent 90+  00:Bad  2     1     1
# 2  11: Credit History <6 Months  01:Ind  2     2     2
# 3       12: Current Balance = 0 02:Good  1     3     3
# 4           13: Balance (2-6)=0    <NA>  1     4    NA
# 5                  20: 1+ x 90+  00:Bad  0     5     1
# 6                30: 3+ x 60-89  01:Ind  0     6     2
# 7                 31: 2 x 60-89 02:Good  0     7     3
# 8                 32: 1 x 60-89    <NA>  0     8    NA
# 9                40: 3+ x 30-59  00:Bad  2     9     1
# 10                41: 2 x 30-59  01:Ind  0    10     2
# 11                42: 1 x 30-59 02:Good  0    11     3
# 12 50: Insufficient Performance    <NA>  1    12    NA
# 13                60: 3+ x 1-29  00:Bad  1    13     1
# 14                 61: 2 x 1-29  01:Ind -1    14     2
# 15                 62: 1 x 1-29 02:Good -1    15     3
# 16         70: Never delinquent    <NA> -1    16    NA

Data

I cleaned your example data by changing the character string "NA" to actual NA values:

is.na(df$y) <- df$y == "NA"
df$y <- droplevels(df$y)

Upvotes: 2

Related Questions