Reputation: 5154
I have two columns in a data.frame
d
of character vectors
t1 <- c("vector, market", "phone34, fax", "material55, animal", "cave", "monday", "fast98")
t2 <- c("vector, market", "phone, fax", "summer, animal", "pan23", "monday", "fast98, ticket")
d <- data.frame(t1, t2, stringsAsFactors=FALSE)
d
t1 t2
1 vector, market vector, market
2 phone34, fax phone, fax
3 material55, animal summer, animal
4 cave pan23
5 monday monday
6 fast98 fast98, ticket
I want to concatenate the two columns to a single column t3, without any duplication.
Using paste
alone gives me duplicates.
d$t3 <- paste(d$t1, d$t2, sep=", ")
> d
t1 t2 t3
1 vector, market vector, market vector, market, vector, market
2 phone34, fax phone, fax phone34, fax, phone, fax
3 material55, animal summer, animal material55, animal, summer, animal
4 cave pan23 cave, pan23
5 monday monday monday, monday
6 fast98 fast98, ticket fast98, fast98, ticket
The desired result will be
t1 t2 t3
1 vector, market vector, market vector, market
2 phone34, fax phone, fax phone34, phone, fax
3 material55, animal summer, animal material55, animal, summer
4 cave pan23 cave, pan23
5 monday monday monday
6 fast98 fast98, ticket fast98, ticket
How can I efficiently do this in R
? Is there a vectorized solution?
Upvotes: 0
Views: 1762
Reputation: 3224
You need to strsplit
each entry of each vector, do a union
of the resulting vectors, and paste
them together:
strsplit(d$t1, split=", ") -> t1s ## list of vectors
strsplit(d$t2, split=", ") -> t2s ## list of vectors
# do a union of the elements and paste them together to get a single string
d$t3 <- sapply(1:length(t1), function(x) paste(union(t1s[[x]], t2s[[x]]), collapse=", "))
I hope that helps.
Upvotes: 3