Marshall Gu
Marshall Gu

Reputation: 137

Convert All Variables in a Dataframe to Numbers

Is there a fast way to convert all variables in a column to numbers, regardless of variable type? ie. if a column only had values "Yes" and "No", they would be converted to 0 and 1; columns with 3 values of "a", "b" and "c" would be converted to 0, 1, 2, etc.

The current df that I am using has the 9th column as "Yes/No".

EDIT:

Using Moody_Mudskipper's suggestion, I have tried:

RawData1 <- as.matrix(as.numeric(factor(RawData[[9]], levels = c("Yes","No"))) - 1)

dput(head(df,10))
structure(c("function (x, df1, df2, ncp, log = FALSE) ", "{", 
"    if (missing(ncp)) ", "        .Call(C_df, x, df1, df2, log)", 
"    else .Call(C_dnf, x, df1, df2, ncp, log)", "}"), .Dim = c(6L, 
1L), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), ""), class = 
"noquote")

Upvotes: 1

Views: 6995

Answers (3)

moodymudskipper
moodymudskipper

Reputation: 47320

you can use factors for this:

df <- data.frame(yn = sample(c("yes","no"),10,T),
                 abc = sample(c("a","b","c"),10,T),
                 stringsAsFactors = F
                 )

df$yn2 <- as.numeric(factor(df$yn,levels = c("yes","no"))) - 1
df$abc2 <- as.numeric(factor(df$abc,levels = c("a","b","c"))) - 1

#     yn abc yn2 abc2
# 1   no   b   1    1
# 2  yes   b   0    1
# 3   no   b   1    1
# 4  yes   a   0    0
# 5  yes   c   0    2
# 6  yes   c   0    2
# 7  yes   c   0    2
# 8  yes   a   0    0
# 9   no   c   1    2
# 10 yes   b   0    1

Upvotes: 1

acylam
acylam

Reputation: 18681

Another Base R solution to convert all columns:

# Added a numeric column to @Moody_Mudskipper's data example
set.seed(1)
df <- data.frame(yn = sample(c("yes","no"),10,T),
                 abc = sample(c("a","b","c"),10,T),
                 num = 1:10,
                 stringsAsFactors = F
)

df = data.frame(lapply(df, function(x) as.numeric(as.factor(x))))

One issue with this though is that it gives:

   yn abc num
1   2   1   1
2   2   1   2
3   1   3   3
4   1   2   4
5   2   3   5
6   1   2   6
7   1   3   7
8   1   3   8
9   1   2   9
10  2   3  10

which is not what OP wants, as he wanted factor/character variables to be converted to 0,1,2,3,... One can try to do this:

df = data.frame(lapply(df, function(x) as.numeric(as.factor(x))-1))

but then all numeric columns would be incorrectly subtracted by 1...Using mutate_all (as in @CPak's answer) has this same issue. What you can do instead is to use mutate_if to only convert columns that are factors/characters:

library(dplyr)
df %>%
  mutate_if(function(x) is.factor(x) | is.character(x), funs(as.numeric(as.factor(.))-1))

# or this...
df %>%
  mutate_if(function(x) !is.numeric(x), funs(as.numeric(as.factor(.))-1))

Now, columns are correctly converted:

   yn abc num
1   1   0   1
2   1   0   2
3   0   2   3
4   0   1   4
5   1   2   5
6   0   1   6
7   0   2   7
8   0   2   8
9   0   1   9
10  1   2  10

Upvotes: 0

CPak
CPak

Reputation: 13581

Moody's answer (+1) explains that you need to convert to factors, then to numeric

You can use mutate_all to change the class of all columns in your data frame

library(dplyr)
df %>% 
   mutate_all(funs(as.numeric(as.factor(.))))

Upvotes: 1

Related Questions