Samuele Ramellini
Samuele Ramellini

Reputation: 23

How to merge two dataframe with several same IDs in r

I tried to merge two data.frames like the ones in the code. The main problem is that I have several rows with the same ID, but I would like to merge all the values (phase) with the ID in df1. I tried to search for similar questions but I couldn't manage to find any.

df1<-as.data.frame(c("a","a","a","a","a","c","c","c","b","b"))
colnames(df1)<-c("ID")
df2<-data.frame(c("a","a","a","a","a","b","b"),c(1,1,0,0,1,1,-1))
colnames(df2)<-c("ID","phase")

output<-cbind(c("a","a","a","a","a","c","c","c","b","b"),c(1,1,0,0,1,NA,NA,NA,1,-1))

I tried to use merge() but it resulted in a much bigger data.frame than the expected output. Moreover I also lost the NAs supposed to be merged with "c".

merge_out<-merge(df1,df2[,c("ID","phase")],by="ID")

ID phase
a     1
a     1
a     0
a     0
a     1
a     1
a     1
a     0
a     0
a     1
a     1
a     1
a     0
a     0
a     1
a     1
a     1
a     0
a     0
a     1
a     1
a     1
a     0
a     0
a     1
b     1
b    -1
b     1
b    -1

Any ideas? Thanks!

Upvotes: 2

Views: 896

Answers (2)

GKi
GKi

Reputation: 39647

Maybe you are looking for pmatch.

cbind(df1, phase=df2$phase[pmatch(df1$ID, df2$ID)])
#cbind(df1, df2[pmatch(df1$ID, df2$ID), "phase", drop = FALSE]) #Alternative
#   ID phase
#1   a     1
#2   a     1
#3   a     0
#4   a     0
#5   a     1
#6   c    NA
#7   c    NA
#8   c    NA
#9   b     1
#10  b    -1

Upvotes: 1

Karthik S
Karthik S

Reputation: 11584

Does this work:

library(dplyr)
> df1 %>% group_by(ID) %>%  mutate(uid = paste0(row_number(), ID)) %>% left_join(
+ df2%>% group_by(ID) %>% mutate(uid = paste0(row_number(), ID))
+ ) %>% select(-uid)
Joining, by = c("ID", "uid")
# A tibble: 10 x 2
# Groups:   ID [3]
   ID    phase
   <chr> <dbl>
 1 a         1
 2 a         1
 3 a         0
 4 a         0
 5 a         1
 6 c        NA
 7 c        NA
 8 c        NA
 9 b         1
10 b        -1

Upvotes: 1

Related Questions