Emily Kothe
Emily Kothe

Reputation: 872

Concatenation across columns that removes NAs and uses the Oxford comma

I would like to implement concatenation across columns that removes NAs and observes the Oxford comma.

    x <-  data.frame(ID = 1:3,
                 col1 = c("snap", "snap", NA),
                 col2 = c(NA, "crackle", "crackle"),
                 col3 = c(NA, NA, "pop"),
                 col4 = c(NA, "yummy", NA))

Using the above dataframe I'd like to concatenate col1:col4 and return the result to x$treats

x$treats[1]
    "snap"

x$treats[2]
"snap, crackle, and yummy" 

x$treats[3]
"crackle and pop"

The dataset also has an ID variable that should not be included in the concatenation (so solutions that don't allow me to specify the required columns aren't complete).

Upvotes: 3

Views: 92

Answers (2)

Onyambu
Onyambu

Reputation: 79198

> x <-  data.frame(ID = 1:3,
                 col1 = c("snap", "snap", NA),
                  col2 = c(NA, "crackle", "crackle"),
                  col3 = c(NA, NA, "pop"),
                  col4 = c(NA, "yummy", NA),stringsAsFactors = F)

> a=gsub("(\\w)\\s+","\\1, ",trimws(do.call(paste,replace(x[-1],is.na(x[-1]),""))))

(x1=transform(x,treat=gsub(",\\s(\\w+)$",", and \\1",a),stringsAsFactors=F))
  ID col1    col2 col3  col4                    treat
1  1 snap    <NA> <NA>  <NA>                     snap
2  2 snap crackle <NA> yummy snap, crackle, and yummy
3  3 <NA> crackle  pop  <NA>         crackle, and pop

> x1$treat[1]
[1] "snap"

> x1$treat[2]
[1] "snap, crackle, and yummy"

> x1$treat[3]
[1] "crackle, and pop"

you can also use collapse from the glue package:

 x$trat=apply(x[-1],1,function(y)glue::collapse(y[!is.na(y)],", ",last = ", and "))
> x$treat[1]
[1] "snap"

> x$treat[2]
[1] "snap, crackle, and yummy"

> x$treat[3]
[1] "crackle, and pop"

Upvotes: 0

Mikko Marttila
Mikko Marttila

Reputation: 11878

Here's another option, although considerably more verbose. By wrapping the list generation into a function, we can also add an option to disable the Oxford comma, if desired:

x <-  data.frame(
  ID = 1:3,
  col1 = c("snap", "snap", NA),
  col2 = c(NA, "crackle", "crackle"),
  col3 = c(NA, NA, "pop"),
  col4 = c(NA, "yummy", NA)
)

language_list <- function(x, oxford_comma = TRUE) {
  x <- x[!is.na(x)]

  if (length(x) < 2) {
    return(x)
  }

  last <- tail(x, 1)
  rest <- head(x, -1)

  if (length(rest) == 1) {
    return(paste(rest, "and", last))
  }

  rest <- paste(rest, collapse = ", ")    
  paste0(rest, if (oxford_comma) ",", " and ", last)
}

cols <- paste0("col", 1:4)
x$treats <- apply(x[, cols], 1, language_list) 

x$treats                                            
#> [1] "snap"                     "snap, crackle, and yummy"
#> [3] "crackle and pop"

Upvotes: 1

Related Questions