Reputation: 17408
This is my wants:
age colorred colorgreen colorblue
1 1 0 0
2 0 1 0
3 0 0 1
I can easily create the data as long as the dataframe contains enough rows to represents all levels of factors . I tend to use the package dummies and this works:
library(dummies)
df <- data.frame(
age = c(1,2,3)
, color = c("red", "green", "blue")
)
df$color <- factor(as.character(df$color), ordered = FALSE, levels = c("red", "green", "blue"))
str(df)
df <- dummy.data.frame(df, names = c("color"))
df
However, if the dataframe does not contain enough data I do not obtain the required format:
library(dummies)
df <- data.frame(
age = 33
, color = "red"
)
df$color <- factor(as.character(df$color), ordered = FALSE, levels = c("red", "green", "blue"))
str(df)
df <- dummy.data.frame(df, names = c("color"))
df
is it possible to bake the transformation into some model, which transforms even if the data only contains one row?
Upvotes: 1
Views: 152
Reputation: 173928
You don't really need any packages to do this. In base R you could do:
my_columns <- c("red", "green", "blue")
df <- data.frame(
age = c(1,2,3),
color = c("red", "green", "blue")
)
cbind(age = df$age, `colnames<-`(as.data.frame(t(sapply(df$color,
function(x) as.numeric(x == my_columns)))), my_columns))
#> age red green blue
#> 1 1 1 0 0
#> 2 2 0 1 0
#> 3 3 0 0 1
df <- data.frame(
age = 33, color = "red"
)
cbind(age = df$age, `colnames<-`(as.data.frame(t(sapply(df$color,
function(x) as.numeric(x == my_columns)))), my_columns))
#> age red green blue
#> 1 33 1 0 0
EDIT
A more complete solution allowing processing of multiple columns at once could be achieved by writing a function to handle the logic:
expand_factors <- function(df, columns)
{
for(column in columns){
if(is.character(df[[column]])) df[[column]] <- factor(df[[column]])
my_columns <- levels(df[[column]])
mat <- t(sapply(df[[column]], function(x) as.numeric(x == my_columns)))
new_cols <- setNames(as.data.frame(mat), my_columns)
df <- cbind(df[which(names(df) != column)], new_cols)
}
df
}
So that if I had this data frame:
df <- data.frame(age = 1:3,
shoe_size = 4:6,
colors = c("red", "green", "blue"),
fruits = c("apples", "bananas", "cherries"),
temp = factor(rep("cold", 3), levels = c("hot", "cold")))
df
#> age shoe_size colors fruits temp
#> 1 1 4 red apples cold
#> 2 2 5 green bananas cold
#> 3 3 6 blue cherries cold
Then I can expand all the factors I like by doing this:
expand_factors(df, c("colors", "fruits", "temp"))
#> age shoe_size blue green red apples bananas cherries hot cold
#> 1 1 4 0 0 1 1 0 0 0 1
#> 2 2 5 0 1 0 0 1 0 0 1
#> 3 3 6 1 0 0 0 0 1 0 1
Created on 2020-08-20 by the reprex package (v0.3.0)
Upvotes: 2