Reputation: 757
I have many data sets that I want to merge them and make them unique. I am trying to make a representative data here
df1 <- read.table(text="info var1 var2
1 C001 mytest1 NA
2 C002 mytest2 NA
3 C003 myse1 data1
4 C004 NA NA
5 C007 where1 India
6 C010 ohio city
11 C016 number fifty
12 C017 city rome", header=T, stringsAsFactors=F)
and this
df2 <- read.table(text="info var1 var2
1 C003 myse1 data1
2 C007 where1 India
3 C010 ohio city
4 C016 number fifty
5 C017 city rome
6 C022 country India
7 C023 number 10", header=T, stringsAsFactors=F)
df3 <- read.table(text="info var1 var2 var3
1 C017 city rome ind
2 C022 country India bes
3 C027 this there NA", header=T, stringsAsFactors=F)
I want to combine them all together bases on info but make them unique. when I want to combine all files, I do this
library(tidyverse)
library(dplyr)
list(df1, df2, df3) %>% reduce(full_join, by = "info")
but I want the output to be like this
info var1.x var2.x var3
C001 mytest1 NA NA
C002 mytest2 NA NA
C003 myse1 data1 NA
C004 NA NA NA
C007 where1 India NA
C010 ohio city NA
C016 number fifty NA
C017 city rome ind
C022 country India bes
C023 number 10 NA
C027 this there NA
Upvotes: 0
Views: 45
Reputation: 3195
I think this should work for you.
bind_rows(df1, df2, df3) %>%
unique() %>%
mutate(rsum = rowSums(!is.na(.))) %>%
group_by(info) %>%
filter(rsum == max(rsum)) %>%
select(-rsum)
info var1 var2 var3
<chr> <chr> <chr> <chr>
1 C001 mytest1 <NA> <NA>
2 C002 mytest2 <NA> <NA>
3 C003 myse1 data1 <NA>
4 C004 <NA> <NA> <NA>
5 C007 where1 India <NA>
6 C010 ohio city <NA>
7 C016 number fifty <NA>
8 C023 number 10 <NA>
9 C017 city rome ind
10 C022 country India bes
11 C027 this there <NA>
Upvotes: 1
Reputation: 2425
The following solution first generates your unique keys by which you seek to merge your datasets together, the shared "info" column. Then using a left join merge to add the respective columns from var1 in df1 and df2, var2 in df1 and df2, and var3 in df3
library(dplyr)
info <- data.frame(info=unique(c(df1$info,df2$info,df3$info)))
var1s <- unique(rbind(df1[,c("info","var1")],df2[,c("info","var1")],df3[,c("info","var1")]))
var2s <- unique(rbind(df1[,c("info","var2")],df2[,c("info","var2")],df3[,c("info","var2")]))
var3s <- unique(df3[,c("info","var3")])
merge(x=info,y=var1s,by="info",all.x=T) %>% merge(y=var2s,by="info",all.x=T) %>% merge(y=var3s,by="info",all.x=T)
Result:
> merge(x=info,y=var1s,by="info",all.x=T) %>% merge(y=var2s,by="info",all.x=T) %>% merge(y=var3s,by="info",all.x=T)
info var1 var2 var3
1 C001 mytest1 <NA> <NA>
2 C002 mytest2 <NA> <NA>
3 C003 myse1 data1 <NA>
4 C004 <NA> <NA> <NA>
5 C007 where1 India <NA>
6 C010 ohio city <NA>
7 C016 number fifty <NA>
8 C017 city rome ind
9 C022 country India bes
10 C023 number 10 <NA>
11 C027 this there <NA>
Upvotes: 0