Reputation: 103
I am working with a dataset that has a column with country codes named "ccode":
When I create another column to create country names with the name "country", I use the function "countrycode"from the countrycode package that I downloaded form CRAN and have the following results:
votes_processed <- votes %>%
filter(vote <= 3) %>%
mutate(year = session + 1945,
country = countrycode(ccode,"cown","country.name"))
and the following warning message:
Warning message:
In countrycode(ccode, "cown", "country.name") :
Some values were not matched unambiguously: 260, 816
Since these country codes cannot be assigned a country name, I filtered them out of the dataframe:
> table(is.na(votes_processed$country))
FALSE TRUE
350844 2703
> votes_processed <- filter(votes_processed,!is.na(country))
> table(is.na(votes_processed$country))
FALSE
350844
Afterwards I run the following commands to create another tibble that gives me grouped information regarding the total votes and the proportion of "yes" (1-yes) votes by year and country:
# Group by year and country: by_year_country
by_year_country <- votes_processed %>%
group_by(year,country) %>%
summarize(total = n(),
percent_yes = mean(vote == 1))
Then I run the following command to nest the data by country and the console sends the following warning and erases my country column:
> nested <- by_year_country %>%
+ nest(-country)
Warning message:
Unknown or uninitialised column: 'country'.
> nested$country
NULL
Warning messages:
1: Unknown or uninitialised column: 'country'.
2: Unknown or uninitialised column: 'country'.
Could someone explain me what is happening with this "country" column and why R is not recognizing it and what can I do about it?
I am a beginner in this platform. I got a comment asking for a sample of the data, I paste it here:
rcid<-c(5168,4317,3598,2314,1220,5024,3151,2042,2513,238,4171,3748,2595,
5160,4476,308,3621,874,2025,3793,3595,1191,987,1207,2255,211,
2585,2319,3590,189)
session<- c(66,56,46,36,26,64,42,34,38,4,54,48,38,66,58,6,46,18,34,
48,46,26,22,26,36,4,38,36,46,4)
vote<- c(1,8,1,8,9,1,3,2,2,9,2,1,3,1,1,1,1,1,1,1,1,1,9,2,1,9,1,1,1,2)
ccode<-as.integer(c(816,816,816,816,816,816,260,260,260,260,2,42,2,20,
31,41,20,42,41,31,70,95,80,93,58,51,53,90,55,90))
sample_data_votes<-data.frame("rcid"=rcid,"session"=session, "vote"= vote,
"ccode"=ccode)
Thank you very much for your time and advice.
Upvotes: 1
Views: 1518
Reputation: 1220
Looks like you need to remove the -country
part from your call to nest
library(dplyr)
library(tidyr)
library(countrycode)
rcid<-c(5168,4317,3598,2314,1220,5024,3151,2042,2513,238,4171,3748,2595,
5160,4476,308,3621,874,2025,3793,3595,1191,987,1207,2255,211,
2585,2319,3590,189)
session<- c(66,56,46,36,26,64,42,34,38,4,54,48,38,66,58,6,46,18,34,
48,46,26,22,26,36,4,38,36,46,4)
vote<- c(1,8,1,8,9,1,3,2,2,9,2,1,3,1,1,1,1,1,1,1,1,1,9,2,1,9,1,1,1,2)
ccode<-as.integer(c(816,816,816,816,816,816,260,260,260,260,2,42,2,20,
31,41,20,42,41,31,70,95,80,93,58,51,53,90,55,90))
votes<-data.frame("rcid"=rcid,"session"=session, "vote"= vote,
"ccode"=ccode)
votes_processed <- votes %>%
filter(vote <= 3) %>%
mutate(year = session + 1945,
country = countrycode(ccode,"cown","country.name")) %>%
filter(!is.na(country))
by_year_country <- votes_processed %>%
group_by(year,country) %>%
summarize(total = n(),
percent_yes = mean(vote == 1))
nested <- by_year_country %>%
nest()
Having -country told nest to use everything but country. By default nest uses all columns except grouping columns. by_year_country is a tibble that is grouped by year. The summarize call removes one level of grouping so it is no longer grouped by country but is still grouped by year. If you want to remove the grouping use ungroup()
Upvotes: 1
Reputation: 13125
by_year_country
is grouped so you need first to ungrouped then do nesting
library(tidyverse)
by_year_country %>% ungroup() %>%
nest(-country) %>% head(n=2)
# A tibble: 2 x 2
country data
<chr> <list>
1 Guatemala <tibble [2 x 3]>
2 Haiti <tibble [2 x 3]>
Upvotes: 3