Split big column of rbinded columns to make wide data in R

Question

I have a data frame which has a column of city names. Country names are specified for the cities. However, this is done in a way where one cell has a country name in it and then the following cells have city names from that country. It looks like this:

countries = tibble(
  city = c('Belgium', 'Bruxelles', 'Antwerpen', 'Bulgaria', 'Sofia', 'Plovdiv')
)

# A tibble: 6 x 1
#  city     
#       
# 1 Belgium  
# 2 Bruxelles
# 3 Antwerpen
# 4 Bulgaria 
# 5 Sofia    
# 6 Plovdiv

I want to have a column that identifies the country for each city, like this:

#  city     country
#       
# 1 Bruxelles  Belgium
# 2 Antwerpen  Belgium
# 3 Sofia  Bulgaria    
# 4 Plovdiv  Bulgaria

I figured out how to mark whether a cell in the city column in the first example is actually a city or a country, but then I got stuck.

ulfelder · Accepted Answer

Here's a solution that uses a list of countries in the world.cities table associated with the maps package and some tidyverse functionality:

library(maps)
library(tidyverse)

countries %>%
  mutate(country = ifelse(city %in% world.cities$country.etc, city, NA)) %>%
  fill(country) %>%
  filter(country != city)

Result:

# A tibble: 4 x 2
  city      country 
          
1 Bruxelles Belgium 
2 Antwerpen Belgium 
3 Sofia     Bulgaria
4 Plovdiv   Bulgaria

If some of your country names don't match the ones used in world.cities, you could also try using one or more of the country-name columns in the codelist table associated with the countrycode package, e.g., codelist$country.name.en.

Split big column of rbinded columns to make wide data in R

Answers (2)

Related Questions