Tidyr's "nest" function in R doesn't recognize a variable and prints: "Warning message: Unknown or uninitialised column"

I am working with a dataset that has a column with country codes named "ccode":

votes tibble

When I create another column to create country names with the name "country", I use the function "countrycode"from the countrycode package that I downloaded form CRAN and have the following results:

votes_processed <- votes %>%
  filter(vote <= 3) %>%
  mutate(year = session + 1945,
         country = countrycode(ccode,"cown","country.name"))

and the following warning message:

Warning message:
In countrycode(ccode, "cown", "country.name") :
  Some values were not matched unambiguously: 260, 816

country votes tibble

Since these country codes cannot be assigned a country name, I filtered them out of the dataframe:

> table(is.na(votes_processed$country))

 FALSE   TRUE 
350844   2703 
> votes_processed <- filter(votes_processed,!is.na(country))
> table(is.na(votes_processed$country))

 FALSE 
350844 

Afterwards I run the following commands to create another tibble that gives me grouped information regarding the total votes and the proportion of "yes" (1-yes) votes by year and country:

# Group by year and country: by_year_country
by_year_country <- votes_processed %>%
  group_by(year,country) %>%
  summarize(total = n(),
            percent_yes = mean(vote == 1))

by_year_country tibble

Then I run the following command to nest the data by country and the console sends the following warning and erases my country column:

> nested <- by_year_country %>%
+   nest(-country)
Warning message:
Unknown or uninitialised column: 'country'. 

nested tibble

> nested$country
NULL
Warning messages:
1: Unknown or uninitialised column: 'country'. 
2: Unknown or uninitialised column: 'country'. 

Could someone explain me what is happening with this "country" column and why R is not recognizing it and what can I do about it?

I am a beginner in this platform. I got a comment asking for a sample of the data, I paste it here:

rcid<-c(5168,4317,3598,2314,1220,5024,3151,2042,2513,238,4171,3748,2595,
        5160,4476,308,3621,874,2025,3793,3595,1191,987,1207,2255,211,
        2585,2319,3590,189)
session<- c(66,56,46,36,26,64,42,34,38,4,54,48,38,66,58,6,46,18,34,
            48,46,26,22,26,36,4,38,36,46,4)
vote<- c(1,8,1,8,9,1,3,2,2,9,2,1,3,1,1,1,1,1,1,1,1,1,9,2,1,9,1,1,1,2)
ccode<-as.integer(c(816,816,816,816,816,816,260,260,260,260,2,42,2,20,
                    31,41,20,42,41,31,70,95,80,93,58,51,53,90,55,90))

sample_data_votes<-data.frame("rcid"=rcid,"session"=session, "vote"= vote,
                              "ccode"=ccode)

Thank you very much for your time and advice.

Upvotes: 1

Views: 1518

Answers (2)

see24
see24

Reputation: 1220

Looks like you need to remove the -country part from your call to nest

library(dplyr)
library(tidyr)
library(countrycode)
rcid<-c(5168,4317,3598,2314,1220,5024,3151,2042,2513,238,4171,3748,2595,
        5160,4476,308,3621,874,2025,3793,3595,1191,987,1207,2255,211,
        2585,2319,3590,189)
session<- c(66,56,46,36,26,64,42,34,38,4,54,48,38,66,58,6,46,18,34,
            48,46,26,22,26,36,4,38,36,46,4)
vote<- c(1,8,1,8,9,1,3,2,2,9,2,1,3,1,1,1,1,1,1,1,1,1,9,2,1,9,1,1,1,2)
ccode<-as.integer(c(816,816,816,816,816,816,260,260,260,260,2,42,2,20,
                    31,41,20,42,41,31,70,95,80,93,58,51,53,90,55,90))

votes<-data.frame("rcid"=rcid,"session"=session, "vote"= vote,
                              "ccode"=ccode)
votes_processed <- votes %>%
  filter(vote <= 3) %>%
  mutate(year = session + 1945,
         country = countrycode(ccode,"cown","country.name")) %>% 
  filter(!is.na(country))

by_year_country <- votes_processed %>%
  group_by(year,country) %>%
  summarize(total = n(),
            percent_yes = mean(vote == 1))

nested <- by_year_country %>%
  nest()

Having -country told nest to use everything but country. By default nest uses all columns except grouping columns. by_year_country is a tibble that is grouped by year. The summarize call removes one level of grouping so it is no longer grouped by country but is still grouped by year. If you want to remove the grouping use ungroup()

Upvotes: 1

A. Suliman
A. Suliman

Reputation: 13125

by_year_country is grouped so you need first to ungrouped then do nesting

library(tidyverse)
by_year_country %>% ungroup() %>% 
                     nest(-country) %>% head(n=2)

# A tibble: 2 x 2
  country   data            
 <chr>     <list>          
1 Guatemala <tibble [2 x 3]>
2 Haiti     <tibble [2 x 3]>

Upvotes: 3

Related Questions