Reputation: 535
I have a data frame which has a column of city names. Country names are specified for the cities. However, this is done in a way where one cell has a country name in it and then the following cells have city names from that country. It looks like this:
countries = tibble(
city = c('Belgium', 'Bruxelles', 'Antwerpen', 'Bulgaria', 'Sofia', 'Plovdiv')
)
# A tibble: 6 x 1
# city
# <chr>
# 1 Belgium
# 2 Bruxelles
# 3 Antwerpen
# 4 Bulgaria
# 5 Sofia
# 6 Plovdiv
I want to have a column that identifies the country for each city, like this:
# city country
# <chr> <chr>
# 1 Bruxelles Belgium
# 2 Antwerpen Belgium
# 3 Sofia Bulgaria
# 4 Plovdiv Bulgaria
I figured out how to mark whether a cell in the city column in the first example is actually a city or a country, but then I got stuck.
Upvotes: 0
Views: 45
Reputation: 5335
Here's a solution that uses a list of countries in the world.cities
table associated with the maps
package and some tidyverse
functionality:
library(maps)
library(tidyverse)
countries %>%
mutate(country = ifelse(city %in% world.cities$country.etc, city, NA)) %>%
fill(country) %>%
filter(country != city)
Result:
# A tibble: 4 x 2
city country
<chr> <chr>
1 Bruxelles Belgium
2 Antwerpen Belgium
3 Sofia Bulgaria
4 Plovdiv Bulgaria
If some of your country names don't match the ones used in world.cities
, you could also try using one or more of the country-name columns in the codelist
table associated with the countrycode
package, e.g., codelist$country.name.en
.
Upvotes: 2
Reputation: 852
You can select odd
and even
rows and use their indexes to subset the original df.
df <- data.frame(city=c("state1","city1","state2","city2","state3","city3"))
> df
city
1 state1
2 city1
3 state2
4 city2
5 state3
6 city3
new_df <-data.frame(state=df[seq(1,nrow(df),2),],city=df[seq(2,nrow(df),2),])
> new_df
state city
1 state1 city1
2 state2 city2
3 state3 city3
If you have more cities, you can change the sequence number, but all states need the same number of cities for this approach to work.
Upvotes: 0