Factor to integer on columns that don't all contain the same levels

Question

I have 150k columns of 105 million entries which are either "none", "01", "12", "2+". Unfortunately not all columns contain all of the factors.

e.g.

df <- data.frame(x1 = rep(c("none", "12", "2+"), each = 5),
                 x2 = rep(c("none", "01", "12"), each = 5)) %>% 
  data.table::as.data.table()

so if I do

df$x1<-as.integer(as.factor(df$x1))

I get the same as

df$x2<-as.integer(as.factor(df$x2))

which isn't what I'm after.

So I could do:

require(magrittr)
df$x1<-factor(df$x1,levels = c("none","01","12","2+")) %>% as.integer()
df$x2<-factor(df$x2,levels = c("none","01","12","2+")) %>% as.integer()

And that does the job but I have 150K columns. What is the best way to deal with them as I can't do the above one by one?

akrun · Accepted Answer

If we want to apply on multiple columns use across

library(dplyr)
df1 <- df %>%
    mutate(across(everything(), ~
      as.integer(factor(., levels = c("none","01","12","2+"))))

If we want to ignore the first one, specify the index with -

df1 <- df %>%
    mutate(across(-1, ~
      as.integer(factor(., levels = c("none","01","12","2+"))))

Or use base R

df[] <-  lapply(df, function(x) as.integer(factor(x, levels = c("none","01","12","2+"))))

Factor to integer on columns that don't all contain the same levels

Answers (2)

Related Questions

Factor to integer on columns that don&#39;t all contain the same levels

Answers (2)

Related Questions

Factor to integer on columns that don't all contain the same levels