Matteo
Matteo

Reputation: 73

Merging rows in R

I want to merge 2 data frames with matching columns and a common factor, but with different values in the rows, like this:

x <- data.frame(x = c("Uno", "Due", "Tre", "Quattro", "Cinque"), y = c("A", "B", "C", "D", "E"))
           x   y
     1     Uno A
     2     Due B
     3     Tre C
     4 Quattro D
     5  Cinque E

y <- data.frame(x = c("Uno", "Tre", "Cinque"), y = c("F", "G", "H")) 
           x   y
     1     Uno F
     2     Tre G
     3  Cinque H

and I want to get:

       x   y
 1     Uno A, F
 2     Due    B
 3     Tre C, G
 4 Quattro    D
 5  Cinque E, H

A lot of posts solve this with aggregate and merge for numbers, and I don't know what to do for strings. Do I have to make my own function, with function(x, etc)?

Upvotes: 2

Views: 138

Answers (2)

JereB
JereB

Reputation: 137

You can easily do this using dplyr

x <- data.frame(x = c("Uno", "Due", "Tre", "Quattro", "Cinque"), y = c("A", "B", "C", "D", "E"), stringsAsFactors = F)

y <- data.frame(x = c("Uno", "Tre", "Cinque"), y = c("F", "G", "H"), stringsAsFactors = F) 


jdf <- full_join(x,y,"x") 

transmute(jdf, x = x, y = ifelse(is.na(y.y),y.x,paste(y.x, y.y, sep=', ')))

 x    y
1     Uno A, F
2     Due    B
3     Tre C, G
4 Quattro    D
5  Cinque E, H

Upvotes: 1

David Arenburg
David Arenburg

Reputation: 92282

If your values in the y column are characters rather factors, this could be an easy task using data.table

x <- data.frame(x = c("Uno", "Due", "Tre", "Quattro", "Cinque"), 
                y = c("A", "B", "C", "D", "E"),
                stringsAsFactors = FALSE)
y <- data.frame(x = c("Uno", "Tre", "Cinque"), 
                y = c("F", "G", "H"),
                stringsAsFactors = FALSE)

(Notice the stringsAsFactors = FALSE)

Then (assuming you always doing a left join)

library(data.table)
setDT(x)[y, y := toString(c(y, i.y)), on = "x", by = .EACHI]
x
#          x    y
# 1:     Uno A, F
# 2:     Due    B
# 3:     Tre C, G
# 4: Quattro    D
# 5:  Cinque E, H

This is basically combining y columns from both data sets on the fly (while joining on the x columns) and updates y in place (without creating a new data set).

Upvotes: 3

Related Questions