petyar
petyar

Reputation: 535

Split big column of rbinded columns to make wide data in R

I have a data frame which has a column of city names. Country names are specified for the cities. However, this is done in a way where one cell has a country name in it and then the following cells have city names from that country. It looks like this:

countries = tibble(
  city = c('Belgium', 'Bruxelles', 'Antwerpen', 'Bulgaria', 'Sofia', 'Plovdiv')
)

# A tibble: 6 x 1
#  city     
#   <chr>    
# 1 Belgium  
# 2 Bruxelles
# 3 Antwerpen
# 4 Bulgaria 
# 5 Sofia    
# 6 Plovdiv 

I want to have a column that identifies the country for each city, like this:

#  city     country
#   <chr>    <chr>
# 1 Bruxelles  Belgium
# 2 Antwerpen  Belgium
# 3 Sofia  Bulgaria    
# 4 Plovdiv  Bulgaria

I figured out how to mark whether a cell in the city column in the first example is actually a city or a country, but then I got stuck.

Upvotes: 0

Views: 45

Answers (2)

ulfelder
ulfelder

Reputation: 5335

Here's a solution that uses a list of countries in the world.cities table associated with the maps package and some tidyverse functionality:

library(maps)
library(tidyverse)

countries %>%
  mutate(country = ifelse(city %in% world.cities$country.etc, city, NA)) %>%
  fill(country) %>%
  filter(country != city)

Result:

# A tibble: 4 x 2
  city      country 
  <chr>     <chr>   
1 Bruxelles Belgium 
2 Antwerpen Belgium 
3 Sofia     Bulgaria
4 Plovdiv   Bulgaria

If some of your country names don't match the ones used in world.cities, you could also try using one or more of the country-name columns in the codelist table associated with the countrycode package, e.g., codelist$country.name.en.

Upvotes: 2

fra
fra

Reputation: 852

You can select odd and even rows and use their indexes to subset the original df.

df <- data.frame(city=c("state1","city1","state2","city2","state3","city3"))

> df
  city
 1 state1
 2  city1
 3 state2
 4  city2
 5 state3
 6  city3

new_df <-data.frame(state=df[seq(1,nrow(df),2),],city=df[seq(2,nrow(df),2),])

> new_df
state  city
1 state1 city1
2 state2 city2
3 state3 city3

If you have more cities, you can change the sequence number, but all states need the same number of cities for this approach to work.

Upvotes: 0

Related Questions