Reputation: 355
I have some problems to join tables with dplyr::left_join
.
I created the following data.frame:
conservation <- structure(list(conservation1 = c("EX ", "EW ", "CR ", "EN ",
"VU ", "NT ", "LC ", "DD ", "NE ", "PE ", "PEW "), description = c(" Extinct",
" Extinct em the wild", " Critically Endangered", " Endangered",
" Vulnerable", " Near Threatened", " Least Concern", " Data deficient",
" Not evaluated", " Probably extinct (informal)", " Probably extinct em the wild (informal)"
)), class = "data.frame", row.names = c(NA, -11L))
I want to add it to the conservation of the msleep of ggplot2, to obtain the conservation of the msleep with the description.
The msleep variables I am considering are:
msleep <- structure(list(name = c("Cheetah", "Owl monkey", "Mountain beaver",
"Greater short-tailed shrew", "Cow", "Three-toed sloth", "Northern fur seal",
"Vesper mouse", "Dog", "Roe deer", "Goat", "Guinea pig", "Grivet",
"Chinchilla", "Star-nosed mole", "African giant pouched rat",
"Lesser short-tailed shrew", "Long-nosed armadillo", "Tree hyrax",
"North American Opossum", "Asian elephant", "Big brown bat",
"Horse", "Donkey", "European hedgehog", "Patas monkey", "Western american chipmunk",
"Domestic cat", "Galago", "Giraffe", "Pilot whale", "Gray seal",
"Gray hyrax", "Human", "Mongoose lemur", "African elephant",
"Thick-tailed opposum", "Macaque", "Mongolian gerbil", "Golden hamster",
"Vole ", "House mouse", "Little brown bat", "Round-tailed muskrat",
"Slow loris", "Degu", "Northern grasshopper mouse", "Rabbit",
"Sheep", "Chimpanzee", "Tiger", "Jaguar", "Lion", "Baboon", "Desert hedgehog",
"Potto", "Deer mouse", "Phalanger", "Caspian seal", "Common porpoise",
"Potoroo", "Giant armadillo", "Rock hyrax", "Laboratory rat",
"African striped mouse", "Squirrel monkey", "Eastern american mole",
"Cotton rat", "Mole rat", "Arctic ground squirrel", "Thirteen-lined ground squirrel",
"Golden-mantled ground squirrel", "Musk shrew", "Pig", "Short-nosed echidna",
"Eastern american chipmunk", "Brazilian tapir", "Tenrec", "Tree shrew",
"Bottle-nosed dolphin", "Genet", "Arctic fox", "Red fox"), conservation = c("lc",
NA, "nt", "lc", "domesticated", NA, "vu", NA, "domesticated",
"lc", "lc", "domesticated", "lc", "domesticated", "lc", NA, "lc",
"lc", "lc", "lc", "en", "lc", "domesticated", "domesticated",
"lc", "lc", NA, "domesticated", NA, "cd", "cd", "lc", "lc", NA,
"vu", "vu", "lc", NA, "lc", "en", NA, "nt", NA, "nt", NA, "lc",
"lc", "domesticated", "domesticated", NA, "en", "nt", "vu", NA,
"lc", "lc", NA, NA, "vu", "vu", NA, "en", "lc", "lc", NA, NA,
"lc", NA, NA, "lc", "lc", "lc", NA, "domesticated", NA, NA, "vu",
NA, NA, NA, NA, NA, NA)), row.names = c(NA, -83L), class = c("tbl_df",
"tbl", "data.frame"))
To achieve the goal, I am applying:
msleep %>%
select(name, conservation) %>%
mutate(conservation = toupper(conservation)) %>%
left_join(conservation1, by = c('conservation'='conservation1'))
My intuition says that this could work, however, the results for the description column give me missing values. Can someone help me ? I am a new user of dplyr. I really appreciate any help.
Upvotes: 0
Views: 183
Reputation: 24770
As @csgroen notes, but does not explicitly show a solution for, you can use trimws
from base R to remove the lagging spaces:
msleep %>%
select(name, conservation) %>%
mutate(conservation = toupper(conservation)) %>%
left_join(conservation %>% mutate(conservation1 = trimws(conservation1)),
by = c("conservation" = "conservation1"))
## A tibble: 83 x 3
# name conservation description
# <chr> <chr> <chr>
# 1 Cheetah LC " Least Concern"
# 2 Owl monkey NA NA
# 3 Mountain beaver NT " Near Threatened"
# 4 Greater short-tailed shrew LC " Least Concern"
# 5 Cow DOMESTICATED NA
# 6 Three-toed sloth NA NA
# 7 Northern fur seal VU " Vulnerable"
# 8 Vesper mouse NA NA
# 9 Dog DOMESTICATED NA
#10 Roe deer LC " Least Concern"
## … with 73 more rows
Upvotes: 1
Reputation: 12461
As @csgroen says, the fundamental problem is that your keys don't match: your category keys are upper case and have trailing blanks. You can also make life simpler for yourself by using the same name for the key in your look up as in the observed data. This gives you what you want:
conservation <- conservation %>% rename(conservation=conservation1)
msleep %>%
mutate(conservation=toupper(conservation)) %>%
left_join(conservation, by="conservation")
Upvotes: 0
Reputation: 2541
welcome to SO!
The problem here is that the levels don't match between the things you want to join, so dplyr
doesn't know how to join these things.
unique(conservation$conservation1)
[1] "lc" NA "nt" "domesticated" "vu" "en" "cd"
unique(conservation$conservation1)
[1] "EX " "EW " "CR " "EN " "VU " "NT " "LC " "DD " "NE " "PE " "PEW "
The levels should be the same (or at least have values in common) for a join.
Upvotes: 1