Reputation: 3
I have a data set with countries column, I want to create a new column and classify the countries into the following categories (first world, second world, third world) countries. I'm relatively new to R and I'm finding it difficult to find a proper function that deals with characters!
My dataset contains the countries like this, and I have three vectors with a list of countries as shown below:
nt_final_table$`Country name`
#[1] "Finland" "Denmark" "Switzerland"
#[4] "Iceland" "Netherlands" "Norway"
#[7] "Sweden" "Luxembourg" "New Zealand"
#[10] "Austria" "Australia" "Israel"
first_world_countries <- c("Australia","Austria","Belgium","Canada","Denmark","France","Germany","Greece","Iceland","Ireland","Israel","Italy","Japan","Luxembourg","Netherlands","New Zealand","Norway","Portugal","South Korea",
"Spain","Sweden","Switzerland","Turkey","United Kingdom","USA")
Second_world_countries <- c("Albania","Armenia","Azerbaijan","Belarus","Bosnia and Herzegovina","Bulgaria","China","Croatia","Cuba","Czech Republic","EastGermany","Estonia","Georgia","Hungary","Kazakhstan","Kyrgyzstan","Laos","Poland","Romania","Russia","Serbia","Slovakia","Slovenia","Tajikistan","Turkmenistan","Ukraine","Uzbekistan","Vietnam")
Third_world_countries <- ("Somalia","Niger","South Sudan")
I would want a new column that contains the following values : First World, Second World, Third World based on the Country name column
Any help would be appreciated! Thanks!
Upvotes: 0
Views: 1067
Reputation: 9857
Here are 2 ways you could do this.
You could use case_when
from the dplyr
package to do this.
library(dplyr)
country_name <-c("Finland", "Denmark", "Switzerland","Iceland", "Netherlands", "Norway", "Sweden", "Luxembourg", "New Zealand",
"Austria", "Australia", "Israel")
nt_final_table <- data.frame(country_name)
first_world_countries <- c("Australia","Austria","Belgium","Canada","Denmark","France","Germany","Greece","Iceland","Ireland","Israel","Italy","Japan","Luxembourg","Netherlands","New Zealand","Norway","Portugal","South Korea", "Spain","Sweden","Switzerland","Turkey","United Kingdom","USA")
second_world_countries <- c("Albania","Armenia","Azerbaijan","Belarus","Bosnia and Herzegovina","Bulgaria","China","Croatia","Cuba","Czech Republic","EastGermany","Estonia","Georgia","Hungary","Kazakhstan","Kyrgyzstan","Laos","Poland","Romania","Russia","Serbia","Slovakia","Slovenia","Tajikistan","Turkmenistan","Ukraine","Uzbekistan","Vietnam")
third_world_countries <- c("Somalia","Niger","South Sudan")
nt_final_table_categorized <- nt_final_table %>% mutate(category = case_when(country_name %in% first_world_countries ~ "First",
country_name %in% second_world_countries ~ "Second",
country_name %in% third_world_countries ~ "Third",
TRUE ~"Not listed"))
nt_final_table_categorized
Sample output
country_name category
1 Finland Not listed
2 Denmark First
3 Switzerland First
4 Iceland First
5 Netherlands First
6 Norway First
7 Sweden First
8 Luxembourg First
9 New Zealand First
10 Austria First
11 Australia First
12 Israel First
In base R
we could create a data frame that lists the countries and their category then use merge
to perform a left-join
on the 2 dataframes.
country_name <-c("Finland", "Denmark", "Switzerland","Iceland", "Netherlands", "Norway", "Sweden", "Luxembourg", "New Zealand",
"Austria", "Australia", "Israel")
nt_final_table <- data.frame(country_name)
first_world_countries <- c("Australia","Austria","Belgium","Canada","Denmark","France","Germany","Greece","Iceland","Ireland","Israel","Italy","Japan","Luxembourg","Netherlands","New Zealand","Norway","Portugal","South Korea", "Spain","Sweden","Switzerland","Turkey","United Kingdom","USA")
second_world_countries <- c("Albania","Armenia","Azerbaijan","Belarus","Bosnia and Herzegovina","Bulgaria","China","Croatia","Cuba","Czech Republic","EastGermany","Estonia","Georgia","Hungary","Kazakhstan","Kyrgyzstan","Laos","Poland","Romania","Russia","Serbia","Slovakia","Slovenia","Tajikistan","Turkmenistan","Ukraine","Uzbekistan","Vietnam")
third_world_countries <- c("Somalia","Niger","South Sudan")
country_name <- c(first_world_countries,second_world_countries,third_world_countries)
categories <- c(rep("First", length(first_world_countries)),
rep("Second",length(second_world_countries)),
rep("Third",length(third_world_countries)))
all_countries_categorised <- data.frame(country_name, categories)
nt_final_table_categorized <-merge(nt_final_table, all_countries_categorised, by ="country_name", all.x=TRUE)
nt_final_table_categorized
Sample output
country_name categories
1 Australia First
2 Austria First
3 Denmark First
4 Finland <NA>
5 Iceland First
6 Israel First
7 Luxembourg First
8 Netherlands First
9 New Zealand First
10 Norway First
11 Sweden First
12 Switzerland First
Upvotes: 1