Ben
Ben

Reputation: 1144

Function to convert set of categorical variables to single vector

There are many posts about creating dummy variables, but in my case I have a set of columns similar to dummy variables which need recoding back into one column.

Given as set of categorical/string variables (counties in the USA):

a<-c(NA,NA,"Cameron","Luzerne");b<-c(NA,"Luzerne",NA,NA);c<-c("Chester",NA,NA,NA)
df<-as.data.frame(cbind(a,b,c))

How to create a function that can convert them to a single category? The function should work for any contiguous set of string columns.

Result should look like this:

newcol    a           b          c
Chester   <NA>        <NA>       Chester
Luzerne   <NA>        Luzerne    <NA>
Cameron   Cameron    <NA>        <NA>
Luzerne   <NA>        Luzerne    <NA>

I wrote this function, which takes three arguments:

cn<-function(df,s,f){
  for(i in seq_along(df[ ,c(s:f)]) )  # for specified columns in a dataframe...
  ifelse(is.na(df[,i]),NA,df[ ,i] )   # return value if not NA
  }

But it doesn't work. I've tried a variety of similar attempts. Fail.

The idea is to take a data frame with some number of string columns and move their values, if not blank, to the new column.

Upvotes: 2

Views: 533

Answers (1)

akrun
akrun

Reputation: 886938

We can use coalesce

library(dplyr)
df %>%
    mutate_all(as.character) %>%
    mutate(newcolumn = coalesce(!!! .)) %>%
    select(newcolumn, everything())
#   newcolumn       a       b       c
#1   Chester    <NA>    <NA> Chester
#2   Luzerne    <NA> Luzerne    <NA>
#3   Cameron Cameron    <NA>    <NA>
#4   Luzerne Luzerne    <NA>    <NA>

In base R, an option is pmax

do.call(pmax, c(lapply(df, as.character), na.rm = TRUE))
#[1] "Chester" "Luzerne" "Cameron" "Luzerne"

Upvotes: 2

Related Questions