Velton Sousa
Velton Sousa

Reputation: 355

Missing values at table join with dplyr left_join

I have some problems to join tables with dplyr::left_join. I created the following data.frame:

conservation <- structure(list(conservation1 = c("EX ", "EW ", "CR ", "EN ", 
"VU ", "NT ", "LC ", "DD ", "NE ", "PE ", "PEW "), description = c(" Extinct", 
" Extinct em the wild", " Critically Endangered", " Endangered", 
" Vulnerable", " Near Threatened", " Least Concern", " Data deficient", 
" Not evaluated", " Probably extinct (informal)", " Probably extinct em the wild (informal)"
)), class = "data.frame", row.names = c(NA, -11L))

I want to add it to the conservation of the msleep of ggplot2, to obtain the conservation of the msleep with the description.

The msleep variables I am considering are:

msleep <- structure(list(name = c("Cheetah", "Owl monkey", "Mountain beaver", 
"Greater short-tailed shrew", "Cow", "Three-toed sloth", "Northern fur seal", 
"Vesper mouse", "Dog", "Roe deer", "Goat", "Guinea pig", "Grivet", 
"Chinchilla", "Star-nosed mole", "African giant pouched rat", 
"Lesser short-tailed shrew", "Long-nosed armadillo", "Tree hyrax", 
"North American Opossum", "Asian elephant", "Big brown bat", 
"Horse", "Donkey", "European hedgehog", "Patas monkey", "Western american chipmunk", 
"Domestic cat", "Galago", "Giraffe", "Pilot whale", "Gray seal", 
"Gray hyrax", "Human", "Mongoose lemur", "African elephant", 
"Thick-tailed opposum", "Macaque", "Mongolian gerbil", "Golden hamster", 
"Vole ", "House mouse", "Little brown bat", "Round-tailed muskrat", 
"Slow loris", "Degu", "Northern grasshopper mouse", "Rabbit", 
"Sheep", "Chimpanzee", "Tiger", "Jaguar", "Lion", "Baboon", "Desert hedgehog", 
"Potto", "Deer mouse", "Phalanger", "Caspian seal", "Common porpoise", 
"Potoroo", "Giant armadillo", "Rock hyrax", "Laboratory rat", 
"African striped mouse", "Squirrel monkey", "Eastern american mole", 
"Cotton rat", "Mole rat", "Arctic ground squirrel", "Thirteen-lined ground squirrel", 
"Golden-mantled ground squirrel", "Musk shrew", "Pig", "Short-nosed echidna", 
"Eastern american chipmunk", "Brazilian tapir", "Tenrec", "Tree shrew", 
"Bottle-nosed dolphin", "Genet", "Arctic fox", "Red fox"), conservation = c("lc", 
NA, "nt", "lc", "domesticated", NA, "vu", NA, "domesticated", 
"lc", "lc", "domesticated", "lc", "domesticated", "lc", NA, "lc", 
"lc", "lc", "lc", "en", "lc", "domesticated", "domesticated", 
"lc", "lc", NA, "domesticated", NA, "cd", "cd", "lc", "lc", NA, 
"vu", "vu", "lc", NA, "lc", "en", NA, "nt", NA, "nt", NA, "lc", 
"lc", "domesticated", "domesticated", NA, "en", "nt", "vu", NA, 
"lc", "lc", NA, NA, "vu", "vu", NA, "en", "lc", "lc", NA, NA, 
"lc", NA, NA, "lc", "lc", "lc", NA, "domesticated", NA, NA, "vu", 
NA, NA, NA, NA, NA, NA)), row.names = c(NA, -83L), class = c("tbl_df", 
"tbl", "data.frame"))

To achieve the goal, I am applying:

msleep %>% 
  select(name, conservation) %>% 
  mutate(conservation = toupper(conservation)) %>% 
  left_join(conservation1, by = c('conservation'='conservation1'))

My intuition says that this could work, however, the results for the description column give me missing values. Can someone help me ? I am a new user of dplyr. I really appreciate any help.

Upvotes: 0

Views: 183

Answers (3)

Ian Campbell
Ian Campbell

Reputation: 24770

As @csgroen notes, but does not explicitly show a solution for, you can use trimws from base R to remove the lagging spaces:

msleep %>% 
  select(name, conservation) %>% 
  mutate(conservation = toupper(conservation)) %>% 
  left_join(conservation %>% mutate(conservation1 = trimws(conservation1)),
            by = c("conservation" = "conservation1"))
## A tibble: 83 x 3
#   name                       conservation description       
#   <chr>                      <chr>        <chr>             
# 1 Cheetah                    LC           " Least Concern"  
# 2 Owl monkey                 NA            NA               
# 3 Mountain beaver            NT           " Near Threatened"
# 4 Greater short-tailed shrew LC           " Least Concern"  
# 5 Cow                        DOMESTICATED  NA               
# 6 Three-toed sloth           NA            NA               
# 7 Northern fur seal          VU           " Vulnerable"     
# 8 Vesper mouse               NA            NA               
# 9 Dog                        DOMESTICATED  NA               
#10 Roe deer                   LC           " Least Concern" 
## … with 73 more rows

Upvotes: 1

Limey
Limey

Reputation: 12461

As @csgroen says, the fundamental problem is that your keys don't match: your category keys are upper case and have trailing blanks. You can also make life simpler for yourself by using the same name for the key in your look up as in the observed data. This gives you what you want:

conservation <- conservation %>% rename(conservation=conservation1)
msleep %>% 
  mutate(conservation=toupper(conservation)) %>% 
  left_join(conservation, by="conservation")

Upvotes: 0

csgroen
csgroen

Reputation: 2541

welcome to SO!

The problem here is that the levels don't match between the things you want to join, so dplyr doesn't know how to join these things.

unique(conservation$conservation1)
[1] "lc"           NA             "nt"           "domesticated" "vu"           "en"           "cd"     

unique(conservation$conservation1)
 [1] "EX "  "EW "  "CR "  "EN "  "VU "  "NT "  "LC "  "DD "  "NE "  "PE "  "PEW "

The levels should be the same (or at least have values in common) for a join.

Upvotes: 1

Related Questions