Reputation: 137
Is there a fast way to convert all variables in a column to numbers, regardless of variable type? ie. if a column only had values "Yes" and "No", they would be converted to 0 and 1; columns with 3 values of "a", "b" and "c" would be converted to 0, 1, 2, etc.
The current df that I am using has the 9th column as "Yes/No".
EDIT:
Using Moody_Mudskipper's suggestion, I have tried:
RawData1 <- as.matrix(as.numeric(factor(RawData[[9]], levels = c("Yes","No"))) - 1)
dput(head(df,10))
structure(c("function (x, df1, df2, ncp, log = FALSE) ", "{",
" if (missing(ncp)) ", " .Call(C_df, x, df1, df2, log)",
" else .Call(C_dnf, x, df1, df2, ncp, log)", "}"), .Dim = c(6L,
1L), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), ""), class =
"noquote")
Upvotes: 1
Views: 6995
Reputation: 47320
you can use factors
for this:
df <- data.frame(yn = sample(c("yes","no"),10,T),
abc = sample(c("a","b","c"),10,T),
stringsAsFactors = F
)
df$yn2 <- as.numeric(factor(df$yn,levels = c("yes","no"))) - 1
df$abc2 <- as.numeric(factor(df$abc,levels = c("a","b","c"))) - 1
# yn abc yn2 abc2
# 1 no b 1 1
# 2 yes b 0 1
# 3 no b 1 1
# 4 yes a 0 0
# 5 yes c 0 2
# 6 yes c 0 2
# 7 yes c 0 2
# 8 yes a 0 0
# 9 no c 1 2
# 10 yes b 0 1
Upvotes: 1
Reputation: 18681
Another Base R
solution to convert all columns:
# Added a numeric column to @Moody_Mudskipper's data example
set.seed(1)
df <- data.frame(yn = sample(c("yes","no"),10,T),
abc = sample(c("a","b","c"),10,T),
num = 1:10,
stringsAsFactors = F
)
df = data.frame(lapply(df, function(x) as.numeric(as.factor(x))))
One issue with this though is that it gives:
yn abc num
1 2 1 1
2 2 1 2
3 1 3 3
4 1 2 4
5 2 3 5
6 1 2 6
7 1 3 7
8 1 3 8
9 1 2 9
10 2 3 10
which is not what OP wants, as he wanted factor/character variables to be converted to 0,1,2,3,... One can try to do this:
df = data.frame(lapply(df, function(x) as.numeric(as.factor(x))-1))
but then all numeric columns would be incorrectly subtracted by 1...Using mutate_all
(as in @CPak's answer) has this same issue. What you can do instead is to use mutate_if
to only convert columns that are factors/characters:
library(dplyr)
df %>%
mutate_if(function(x) is.factor(x) | is.character(x), funs(as.numeric(as.factor(.))-1))
# or this...
df %>%
mutate_if(function(x) !is.numeric(x), funs(as.numeric(as.factor(.))-1))
Now, columns are correctly converted:
yn abc num
1 1 0 1
2 1 0 2
3 0 2 3
4 0 1 4
5 1 2 5
6 0 1 6
7 0 2 7
8 0 2 8
9 0 1 9
10 1 2 10
Upvotes: 0
Reputation: 13581
Moody's answer (+1) explains that you need to convert to factors, then to numeric
You can use mutate_all
to change the class of all columns in your data frame
library(dplyr)
df %>%
mutate_all(funs(as.numeric(as.factor(.))))
Upvotes: 1