Amir
Amir

Reputation: 45

Merging two data frames with different rows in R

I have two data frames. The first one looks like

Country    Year  production
Germany    1996  11
France     1996  12
Greece     1996  15
UK         1996  17
USA        1996  24

The second one contains all the countries that are in the first data frame plus a few more countries for year 2018. It looks likes this

Country    Year   production
Germany    2018   27
France     2018   29
Greece     2018   44
UK         2018   46
USA        2018   99
Austria    2018   56
Japan      2018   66

I would like to merge the two data frames, and the final table should look like this:

Country    Year  production
Germany    1996   11
France     1996   12
Greece     1996   15
UK         1996   17
USA        1996   24
Austria    1996   NA
Japan      1996   NA
Germany    2018   27
France     2018   29
Greece     2018   44
UK         2018   46
USA        2018   99
Austria    2018   56
Japan      2018   66

I've tried several functions including full_join, merge, and rbind but they didn't work. Does anybody have any ideas?

Upvotes: 3

Views: 88

Answers (2)

Parfait
Parfait

Reputation: 107567

Consider base R with expand.grid and merge (and avoid any dependencies should you be a package author):

# BUILD DF OF ALL POSSIBLE COMBINATIONS OF COUNTRY AND YEAR
all_country_years <- expand.grid(Country=unique(c(df_96$Country, df_18$Country)),
                                 Year=c(1996, 2018))

# MERGE (LEFT JOIN)
final_df <- merge(all_country_years, rbind(df_96, df_18), by=c("Country", "Year"), 
                  all.x=TRUE)

# ORDER DATA AND RESET ROW NAMES
final_df <- data.frame(with(final_df, final_df[order(Year, Country),]),
                       row.names = NULL)

final_df
#    Country Year production
# 1  Germany 1996         11
# 2   France 1996         12
# 3   Greece 1996         15
# 4       UK 1996         17
# 5      USA 1996         24
# 6  Austria 1996         NA
# 7    Japan 1996         NA
# 8  Germany 2018         27
# 9   France 2018         29
# 10  Greece 2018         44
# 11      UK 2018         46
# 12     USA 2018         99
# 13 Austria 2018         56
# 14   Japan 2018         66

Demo

Upvotes: 2

tmfmnk
tmfmnk

Reputation: 39858

With dplyr and tidyr, you may use:

bind_rows(df1, df2) %>%
 complete(Country, Year)

   Country  Year production
   <chr>   <int>      <int>
 1 Austria  1996         NA
 2 Austria  2018         56
 3 France   1996         12
 4 France   2018         29
 5 Germany  1996         11
 6 Germany  2018         27
 7 Greece   1996         15
 8 Greece   2018         44
 9 Japan    1996         NA
10 Japan    2018         66
11 UK       1996         17
12 UK       2018         46
13 USA      1996         24
14 USA      2018         99

Upvotes: 2

Related Questions