Concatenating strings in a column according to values in another column in a dataframe

Question

I have a data.frame with two columns of strings as follows.

nos <- c("JM1", "JM2", "JM3", "JM1", "JM5", "JM45", "JM3", "JM45")
ren <- c("book, vend, spent", "marigold, fortune", "smoke, parchment, smell, book", "mental, past, create", "key, fortune, mask, federal", "tell, warn, slip", "wire, dg333, uv12", "tell, warn, slip, furniture")
d <- data.frame(nos, ren, stringsAsFactors=FALSE)

d
   nos                           ren
1  JM1             book, vend, spent
2  JM2             marigold, fortune
3  JM3 smoke, parchment, smell, book
4  JM1          mental, past, create
5  JM5   key, fortune, mask, federal
6 JM45              tell, warn, slip
7  JM3             wire, dg333, uv12
8 JM45   tell, warn, slip, furniture

I want to concatenate the elements of ren column according to the strings in nos column.

For example in the sample data, the elements associated with JM1 which occurs twice should be merged ("book, vend, spent, mental, past, create").

Also the elements associated with JM45 should be merged keeping only unique words. ("tell, warn, slip, furniture")

The output that I am trying to get is like below.

nos1 <- c("JM1", "JM2", "JM3", "JM5", "JM45")
ren1 <- c("book, vend, spent, mental, past, create", "marigold, fortune", "smoke, parchment, smell, book, wire, dg333, uv12", "key, fortune, mask, federal", "tell, warn, slip, furniture")
out <- data.frame(nos1, ren1, stringsAsFactors=FALSE)

out
  nos1                                             ren1
1  JM1          book, vend, spent, mental, past, create
2  JM2                                marigold, fortune
3  JM3 smoke, parchment, smell, book, wire, dg333, uv12
4  JM5                      key, fortune, mask, federal
5 JM45                      tell, warn, slip, furniture

How to do this in R? My original data set has thousands of such rows in a data.frame.

iugrina · Accepted Answer

Using plyr package you could do it like this

ddply(d, .(nos), summarise, ren1=paste0(ren, collapse=", "))

or if you want unique values in ren1 like this

ddply(d, .(nos), summarise, 
      paste0(unique(unlist(strsplit(ren, split=", "))), collapse=", "))

Concatenating strings in a column according to values in another column in a dataframe

Answers (1)

Related Questions