Frank Doe
Frank Doe

Reputation: 192

Comparing two dataframes in R with different number of rows

I have two data frames, that have the same setup as below

Country Name    Country Code    Region  Year    Fertility Rate
Aruba   ABW The Americas    1960    4.82
Afghanistan AFG Asia    1960    7.45
Angola  AGO Africa  1960    7.379
Albania ALB Europe  1960    6.186
United Arab Emirates    ARE Middle East 1960    6.928
Argentina   ARG The Americas    1960    3.109
Armenia ARM Asia    1960    4.55
Antigua and Barbuda ATG The Americas    1960    4.425
Australia   AUS Oceania 1960    3.453
Austria AUT Europe  1960    2.69
Azerbaijan  AZE Asia    1960    5.571
Burundi BDI Africa  1960    6.953
Belgium BEL Europe  1960    2.54

I would like to create a data frame where I list out which countries are missing from the "merged" data frame as compared with the "merged2013" data frame. (Not my naming conventions)

I have tried numerous things I have found on the internet, with only this working below, but not to the way I would like it to

newmerged1 <- (paste(merged$Country.Name) %in% paste(merged2013$Country.Name))+1
newmerged1

This returns a "1" value for countries that aren't found in the merged2013 data frame. I'm assuming there is a way I can get this to list out the Country Name instead of a one or two, or just have a list of the countries not found in the merged2013 data frame without everything else.

Upvotes: 1

Views: 1159

Answers (1)

sconfluentus
sconfluentus

Reputation: 4993

You could use dplyr's anti_join, it is specifically designed to be used this way.

require(dplyr)

missing_data <-anti_join(merged2013, merged, by="Country.Name")

This will return all the rows in merged2013 not in merged.

Upvotes: 3

Related Questions