Function to convert set of categorical variables to single vector

Question

There are many posts about creating dummy variables, but in my case I have a set of columns similar to dummy variables which need recoding back into one column.

Given as set of categorical/string variables (counties in the USA):

a<-c(NA,NA,"Cameron","Luzerne");b<-c(NA,"Luzerne",NA,NA);c<-c("Chester",NA,NA,NA)
df<-as.data.frame(cbind(a,b,c))

How to create a function that can convert them to a single category? The function should work for any contiguous set of string columns.

Result should look like this:

newcol    a           b          c
Chester                  Chester
Luzerne           Luzerne    
Cameron   Cameron            
Luzerne           Luzerne

I wrote this function, which takes three arguments:

cn<-function(df,s,f){
  for(i in seq_along(df[ ,c(s:f)]) )  # for specified columns in a dataframe...
  ifelse(is.na(df[,i]),NA,df[ ,i] )   # return value if not NA
  }

But it doesn't work. I've tried a variety of similar attempts. Fail.

The idea is to take a data frame with some number of string columns and move their values, if not blank, to the new column.

akrun · Accepted Answer

We can use coalesce

library(dplyr)
df %>%
    mutate_all(as.character) %>%
    mutate(newcolumn = coalesce(!!! .)) %>%
    select(newcolumn, everything())
#   newcolumn       a       b       c
#1   Chester         Chester
#2   Luzerne     Luzerne    
#3   Cameron Cameron        
#4   Luzerne Luzerne

In base R, an option is pmax

do.call(pmax, c(lapply(df, as.character), na.rm = TRUE))
#[1] "Chester" "Luzerne" "Cameron" "Luzerne"

Function to convert set of categorical variables to single vector

Answers (1)

Related Questions