cs0815
cs0815

Reputation: 17428

create binary encoded columns based on levels of data frame's factors meta data

I would like to create "binary encoding columns" of factor columns. There are various codes out there, which work fine for data frames with many rows (i.e. which have at least one row for each level). My use case is that I may only have the factors levels as "meta data" in the data frame.

So given a data frame like this:

haves <- data.frame(x = "a")
haves$x <- factor(as.character(haves$x), ordered = FALSE, levels = c(
        "a"
        , "b"
        , "c"
    ))

I would like to obtain this (based on 3 levels => ceiling(log2(3)) => 2 columns):

x bin_x_1 bin_x_2
a       0       0 

I made an attempt below, which does not fully work.

library(binaryLogic)

encode_binary <- function(x, name = "binary_") {
    x2 <- as.binary(unique(unclass(x)) - 1)
    maxlen <- ceiling(log2(nlevels(x)))
    x2 <- lapply(x2, function(y) {
        l <- length(y)
        if (l < maxlen) {
            y <- c(rep(0, (maxlen - l)), y)
        }
        y
    })
    d <- as.data.frame(t(as.data.frame(x2)))
    rownames(d) <- NULL
    colnames(d) <- paste0(name, 1:maxlen)
    d
}

haves <- data.frame(x = "a")
haves$x <- factor(as.character(haves$x), ordered = FALSE, levels = c(
        "a"
        , "b"
        , "c"
    ))

wants <- cbind(haves, encode_binary(haves[["x"]], name = "bin_x_"))
wants

PS:

ceiling(log2(n)) determines how many columns/bits are required to encode levels.

Upvotes: 1

Views: 105

Answers (1)

GKi
GKi

Reputation: 39737

You can use intToBits:

t(sapply(unclass(haves$x)-1, function(x) as.integer(intToBits(x)))[
  seq_len(ceiling(log2(nlevels(haves$x)))),])
#     [,1] [,2]
#[1,]    0    0

and as a function:

encode_binary <- function(x, name = "binary_") {
  x <- t(sapply(unclass(x)-1, function(x) as.integer(intToBits(x)))[
          seq_len(ceiling(log2(nlevels(x)))), , drop = FALSE])
  colnames(x) <- paste0(name, seq_len(ncol(x)))
  as.data.frame(x)
}
encode_binary(haves$x)
#     binary_1 binary_2
#[1,]        0        0

Upvotes: 3

Related Questions