Reputation: 17428
I would like to create "binary encoding columns" of factor columns. There are various codes out there, which work fine for data frames with many rows (i.e. which have at least one row for each level). My use case is that I may only have the factors levels as "meta data" in the data frame.
So given a data frame like this:
haves <- data.frame(x = "a")
haves$x <- factor(as.character(haves$x), ordered = FALSE, levels = c(
"a"
, "b"
, "c"
))
I would like to obtain this (based on 3 levels => ceiling(log2(3)) => 2 columns):
x bin_x_1 bin_x_2
a 0 0
I made an attempt below, which does not fully work.
library(binaryLogic)
encode_binary <- function(x, name = "binary_") {
x2 <- as.binary(unique(unclass(x)) - 1)
maxlen <- ceiling(log2(nlevels(x)))
x2 <- lapply(x2, function(y) {
l <- length(y)
if (l < maxlen) {
y <- c(rep(0, (maxlen - l)), y)
}
y
})
d <- as.data.frame(t(as.data.frame(x2)))
rownames(d) <- NULL
colnames(d) <- paste0(name, 1:maxlen)
d
}
haves <- data.frame(x = "a")
haves$x <- factor(as.character(haves$x), ordered = FALSE, levels = c(
"a"
, "b"
, "c"
))
wants <- cbind(haves, encode_binary(haves[["x"]], name = "bin_x_"))
wants
PS:
ceiling(log2(n)) determines how many columns/bits are required to encode levels.
Upvotes: 1
Views: 105
Reputation: 39737
You can use intToBits
:
t(sapply(unclass(haves$x)-1, function(x) as.integer(intToBits(x)))[
seq_len(ceiling(log2(nlevels(haves$x)))),])
# [,1] [,2]
#[1,] 0 0
and as a function:
encode_binary <- function(x, name = "binary_") {
x <- t(sapply(unclass(x)-1, function(x) as.integer(intToBits(x)))[
seq_len(ceiling(log2(nlevels(x)))), , drop = FALSE])
colnames(x) <- paste0(name, seq_len(ncol(x)))
as.data.frame(x)
}
encode_binary(haves$x)
# binary_1 binary_2
#[1,] 0 0
Upvotes: 3