Reputation: 301
I am working on merging a dataframe df0
with a geographical object. Previously, I used dplyr
to add a column of interest to my geographical data, for this I used the approach suggested [here][1]. It works fine with my big dataset, however I have been trying to use the same approach with a simpler data and I do not manage to replicate. Here is an overview of the problem.
df0
is a list
that contains two columns: "Country" and "PF". It looks like this: Country PF
1 Afghanistan 3
2 Albania 3
3 Algeria 3
4 American Samoa 0
5 Andorra 3
6 Angola 3
7 Anguilla 0
8 Antigua & Barbuda 0
9 Argentina 1
10 Armenia 3
11 Aruba 0
rnaturalearth
package as follows:library(rnaturalearth)
library(rnaturalearthdata)
world <- ne_countries(scale = "medium", returnclass = "sf")
world$Country<-noquote(world$name)
This is how the resulting world$Country
looks like:
1] Aruba Afghanistan Angola
[4] Anguilla Albania Aland
[7] Andorra United Arab Emirates Argentina
[10] Armenia American Samoa Antarctica
[13] Ashmore and Cartier Is. Fr. S. Antarctic Lands Antigua and Barb.
[16] Australia Austria Azerbaijan
[19] Burundi Belgium Benin
[22] Burkina Faso Bangladesh Bulgaria
The idea is to associate the column "PF" to the object world
. To do this, I use the piece of code:
library(dplyr)
df_sum <- df0%>%
filter(Country %in% world$Country) %>%
group_by(Country) %>%
summarise(PF= mean(PF))
world$PF<- df_sum$PF[match(world$Country, df_sum$Country)]
Normally, this does the job. However, for some reason it is not working this time. I have noticed that the object df_sum
contains zero observations after running the code, which means that the first part of the code is the one failing. I feel like probably I am missing some very basic notion, as an amateur programmer. Could you help me out?
Edit in response to the answer provided
Indeed I suspect that the problem comes from df0
. This is how I treat it:
df0<-read.csv("C:/Users/public_funding.csv",sep=",")
df0$X<-NULL
colnames(df0)<-c("Country","PF")
#df0$Country<-levels(droplevels(df0$Country))
#df0$Country<-unlist(df0$Country)
head(df0)
nrow(df0)
This is how the data looks like:
[![df0$Country
][2]][2]
[![df0$Country
][3]][3]
I thought that my problems were generated by the list structure that can be seen in the images. That's the reason you can see in my code that I tries using both df0$Country<-levels(droplevels(df0$Country))
and df0$Country<-unlist(df0$Country)
, but they did not work.
[1]: Merging a Shapefile and a dataframe
[2]: https://i.sstatic.net/cBva8.png
[3]: https://i.sstatic.net/QYz2N.png
Upvotes: 0
Views: 83
Reputation: 301
It turns out that the problem was indeed in df0
. After carefully going trough it I realized there was a blank space after each country name for some reasons. So my code was saved by simply applying:
df0$Country<-trimws(df0$Country, "r")
Upvotes: 0
Reputation: 500
I recreated df0
, ran the rest of your code, and it worked fine for me:
library(rnaturalearth)
library(rnaturalearthdata)
library(rgeos)
library(dplyr)
df0 <- data.frame(Country = c("Afghanistan", "Albania", "Algeria", "American Samoa",
"Andorra", "Angola", "Anguilla", "Antigua & Barbuda",
"Argentina", "Armenia", "Aruba"),
PF = c(3,3,3,0,3,3,0,0,1,3,0), stringsAsFactors = FALSE)
world <- ne_countries(scale = "medium", returnclass = "sf")
world$Country<-noquote(world$name)
df_sum <- df0 %>%
filter(Country %in% world$Country) %>%
group_by(Country) %>%
summarise(PF= mean(PF))
world$PF<- df_sum$PF[match(world$Country, df_sum$Country)]
> world$PF
[1] 0 3 3 0 3 NA 3 NA 1 3 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[35] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3 NA NA NA NA NA
[69] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[103] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[137] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[171] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[205] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[239] NA NA NA
> df_sum
# A tibble: 10 x 2
Country PF
<chr> <dbl>
1 Afghanistan 3
2 Albania 3
3 Algeria 3
4 American Samoa 0
5 Andorra 3
6 Angola 3
7 Anguilla 0
8 Argentina 1
9 Armenia 3
10 Aruba 0
Since you said the df_sum
contains zero observations after running the code, I wonder if it's a problem with df0
. Try recreating df0
from scratch like I did, and if you get the same output, the problem is likely coming from how you're pulling df0
.
Upvotes: 1