Andre230
Andre230

Reputation: 145

Merging dataframes that have different columns

I have 2 dataframes:

df1 = data.frame(Bird_ID = c(1:6), Sex = c("Male","Female","Male","Male","Male","UNK"), Age.years =c("2","4","8","2","12","1"))

df2 = data.frame(Bird_ID = c(7), Sex = c("Female"), date.fledged= c("19/10/2021"))

df1
# Bird_ID Sex Age.years
# 1 Male 2
# 2 Female 4
# 3 Male 8
# 4 Male 2
# 5 Male 12
# 6 UNK 1
       
df2
# Bird_ID Sex Date.fledged
# 7 Female 19/10/2021
dfmerged = data.frame(Bird_ID = c(1:7), Sex = c("Male","Female","Male","Male","Male","UNK","Female"), Age.years =c("2","4","8","2","12","1",NA))`

dfmerged
# Bird_ID Sex Age.years
# 1 Male 2
# 2 Female 4
# 3 Male 8
# 4 Male 2
# 5 Male 12
# 6 UNK 1
# 7 Female NA

How can I update the bird database df1 using information from df2and keeping only (and all) columns that are in the main database df1? For example here dfmerged keeps only the columns from df1,drops the "Date.fledged" column from df2 and bird 7 has NA as "Age.years" because data is missing (and that is the wanted output).

Upvotes: 1

Views: 60

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389235

You may do a full join.

merge(df1, df2, by = c('Bird_ID', 'Sex'), all = TRUE)[-4]

#  Bird_ID    Sex Age.years
#1       1   Male         2
#2       2 Female         4
#3       3   Male         8
#4       4   Male         2
#5       5   Male        12
#6       6    UNK         1
#7       7 Female      <NA>

In dplyr -

library(dplyr)

full_join(df1, df2, by = c('Bird_ID', 'Sex')) %>%
  select(-date.fledged)

Upvotes: 1

Martin Gal
Martin Gal

Reputation: 16998

You could use

library(dplyr)

df1 %>% 
  bind_rows(df2) %>% 
  select(names(df1))

This returns

  Bird_ID    Sex Age.years
1       1   Male         2
2       2 Female         4
3       3   Male         8
4       4   Male         2
5       5   Male        12
6       6    UNK         1
7       7 Female      <NA>

Upvotes: 1

Related Questions