Reputation: 877
I have these data file:
dt.1
Father Daughter
Peter 1
Josh 3
Cold 4
NA . 5
NA . 6
NA . 7
dt.2
Father Weight
Peter 10
Josh 33
Cold 44
NA . 55
NA . 65
NA . 77
And I would like to merge except NA values. I need this:
Father Weight Daughter
Peter 10 1
Josh 33 2
Cold 44 3
NA . 55 NA
NA . 65 NA
NA . 77 NA
NA NA 5
NA NA 6
NA AN 7
I tried a Normal merge:
new.data=merge(dt1,dt2, by="Father", all=T)
But don't worked, and the new file give me a lot of rows more. So, I would like to merge considering only the real values.
Upvotes: 1
Views: 1102
Reputation:
Using dplyr and tidyr you can replace the NA
s in df1 and df2 with placeholders, join the dataframes, and then convert the placeholders back into NA
s:
library(dplyr)
library(tidyr)
replace_na(df1, list(Father = "NA1")) %>%
full_join(replace_na(df2, list(Father = "NA2"))) %>%
mutate(Father = sub("NA.*", NA, Father))
#### OUTPUT ####
Father Daughter Weight
1 Peter 1 10
2 Josh 3 33
3 Cold 4 44
4 <NA> 5 NA
5 <NA> 6 NA
6 <NA> 7 NA
7 <NA> NA 55
8 <NA> NA 65
9 <NA> NA 77
Using base R you can first merge the parts of the dataframe without NA
s, then rbind
the parts with NA
s:
df3 <- merge(subset(df1, !is.na(Father)), df2, by = "Father")
df1$Weight <- df2$Daughter <- NA
rbind(df_final, subset(df2, is.na(Father)), subset(df1, is.na(Father)))
#### OUTPUT ####
Father Daughter Weight
1 Cold 4 44
2 Josh 3 33
3 Peter 1 10
4 <NA> NA 55
5 <NA> NA 65
6 <NA> NA 77
41 <NA> 5 NA
51 <NA> 6 NA
61 <NA> 7 NA
Upvotes: 1
Reputation: 886948
Individually, filter
the datasets without NA
elements in 'Father', do a full_join
and the bind the rows with the other NA rows
library(tidyverse)
dt1 %>%
filter(is.na(Father)) %>%
bind_rows(dt2 %>%
filter(is.na(Father))) %>%
bind_rows(full_join(dt1 %>%
filter(!is.na(Father)),
dt2 %>% filter(!is.na(Father))))%>%
arrange(is.na(Father), is.na(Weight)) %>%
select(Father, Weight, Daughter)
# Father Weight Daughter
#1 Peter 10 1
#2 Josh 33 3
#3 Cold 44 4
#4 <NA> 55 NA
#5 <NA> 65 NA
#6 <NA> 77 NA
#7 <NA> NA 5
#8 <NA> NA 6
#9 <NA> NA 7
Or another option is to split
by the presence of NAs
and join with a logical condtion
map2_df(split(dt1, is.na(dt1$Father)), split(dt2, is.na(dt2$Father)),
~ if(all(is.na(.x$Father))) bind_rows(.x, .y) else full_join(.x, .y))
# Father Daughter Weight
#1 Peter 1 10
#2 Josh 3 33
#3 Cold 4 44
#4 <NA> 5 NA
#5 <NA> 6 NA
#6 <NA> 7 NA
#7 <NA> NA 55
#8 <NA> NA 65
#9 <NA> NA 77
dt1 <- structure(list(Father = c("Peter", "Josh", "Cold", NA, NA, NA
), Daughter = c(1L, 3L, 4L, 5L, 6L, 7L)), class = "data.frame",
row.names = c(NA,
-6L))
dt2 <- structure(list(Father = c("Peter", "Josh", "Cold", NA, NA, NA
), Weight = c(10L, 33L, 44L, 55L, 65L, 77L)), class = "data.frame",
row.names = c(NA,
-6L))
Upvotes: 1