Curious G.
Curious G.

Reputation: 877

Merge columns except NA values in r

I have these data file:

dt.1

Father  Daughter
Peter     1
Josh      3
Cold      4
NA .      5
NA .      6
NA .      7

dt.2

Father  Weight
Peter     10
Josh      33
Cold      44
NA .      55
NA .      65
NA .      77

And I would like to merge except NA values. I need this:

Father       Weight    Daughter
    Peter     10         1
    Josh      33         2
    Cold      44         3
    NA .      55         NA
    NA .      65         NA
    NA .      77         NA
    NA        NA          5
    NA        NA          6
    NA        AN          7

I tried a Normal merge:

new.data=merge(dt1,dt2, by="Father", all=T)

But don't worked, and the new file give me a lot of rows more. So, I would like to merge considering only the real values.

Upvotes: 1

Views: 1102

Answers (2)

user10191355
user10191355

Reputation:

Using dplyr and tidyr you can replace the NAs in df1 and df2 with placeholders, join the dataframes, and then convert the placeholders back into NAs:

library(dplyr)
library(tidyr)

replace_na(df1, list(Father = "NA1")) %>% 
    full_join(replace_na(df2, list(Father = "NA2"))) %>% 
    mutate(Father = sub("NA.*", NA, Father))

#### OUTPUT ####

 Father Daughter Weight
1  Peter        1     10
2   Josh        3     33
3   Cold        4     44
4   <NA>        5     NA
5   <NA>        6     NA
6   <NA>        7     NA
7   <NA>       NA     55
8   <NA>       NA     65
9   <NA>       NA     77

Using base R you can first merge the parts of the dataframe without NAs, then rbind the parts with NAs:

df3 <- merge(subset(df1, !is.na(Father)), df2, by = "Father")
df1$Weight <- df2$Daughter <- NA
rbind(df_final, subset(df2, is.na(Father)), subset(df1, is.na(Father)))

#### OUTPUT ####

   Father Daughter Weight
1    Cold        4     44
2    Josh        3     33
3   Peter        1     10
4    <NA>       NA     55
5    <NA>       NA     65
6    <NA>       NA     77
41   <NA>        5     NA
51   <NA>        6     NA
61   <NA>        7     NA

Upvotes: 1

akrun
akrun

Reputation: 886948

Individually, filter the datasets without NA elements in 'Father', do a full_join and the bind the rows with the other NA rows

library(tidyverse)
dt1 %>% 
  filter(is.na(Father)) %>%
  bind_rows(dt2 %>% 
                filter(is.na(Father))) %>%
  bind_rows(full_join(dt1 %>% 
                        filter(!is.na(Father)),
                      dt2 %>% filter(!is.na(Father))))%>% 
  arrange(is.na(Father), is.na(Weight)) %>% 
  select(Father, Weight, Daughter)
#   Father Weight Daughter
#1  Peter     10        1
#2   Josh     33        3
#3   Cold     44        4
#4   <NA>     55       NA
#5   <NA>     65       NA
#6   <NA>     77       NA
#7   <NA>     NA        5
#8   <NA>     NA        6
#9   <NA>     NA        7

Or another option is to split by the presence of NAs and join with a logical condtion

map2_df(split(dt1, is.na(dt1$Father)), split(dt2, is.na(dt2$Father)),
     ~ if(all(is.na(.x$Father))) bind_rows(.x, .y) else full_join(.x, .y))
#   Father Daughter Weight
#1  Peter        1     10
#2   Josh        3     33
#3   Cold        4     44
#4   <NA>        5     NA
#5   <NA>        6     NA
#6   <NA>        7     NA
#7   <NA>       NA     55
#8   <NA>       NA     65
#9   <NA>       NA     77

data

dt1 <- structure(list(Father = c("Peter", "Josh", "Cold", NA, NA, NA
), Daughter = c(1L, 3L, 4L, 5L, 6L, 7L)), class = "data.frame", 
row.names = c(NA, 
-6L))

dt2 <- structure(list(Father = c("Peter", "Josh", "Cold", NA, NA, NA
), Weight = c(10L, 33L, 44L, 55L, 65L, 77L)), class = "data.frame",
row.names = c(NA, 
-6L))

Upvotes: 1

Related Questions