Reputation: 13
I have this data.frame with a variable V21 in which many countries are recorded, I want to make it smaller by just specifying the continent rather then all those countries. For example 'Cuba', 'Peru', 'Argentina' rather than being separate levels of V21, I want them to become level 'South America'. Here's the code I tried to use:
recode(WaveOne.test$V21, "levels("Cuba","Colombia","Costa Rica","Argentina","Chile","Ecuador","Peru","Venezuela")= 'South America'")
Can you suggest what is wrong with my code or maybe a different method? I am a complete newbie in R and its syntax. Thank you!
========UPDATE=========
SA_countries <- c("Cuba", "Mexico", "Argentina","Jamaica", "Haiti","West Indies", "Chile", "Ecuador", "Venezuela", "Other South America", "El Salvador", "Guatemala", "Nicaragua", "Dominican Republic", "Panama", "Costa Rica", "Peru")
Asia_countries <- c("Philippines", "Vietnam", "Laos", "Cambodia", "Hmong", "Other Asia", "China", "Hong Kong", "Taiwan", "Japan", "Korea", "India", "Pakistan") Europe_Canada <- c("Europe/Canada") MiddleEast_Africa <- c("Middle East/Africa")
continents <- list(`South America`= SA_countries, `Asia` = Asia_countries, `Europe_Canada` = Europe_Canada, `Middle East & Africa` = MiddleEast_Africa)
levels(WaveOne.test$V21) <- c(levels(WaveOne.test$V21), names(continents))
for(i in seq_along(continents)) WaveOne.test$V21[WaveOne.test$V21 %in% continents[[i]]] <- names(continents)[i]
levels(WaveOne.test$V21)
My output however is:
levels(WaveOne.test$V21)
1 "Cuba" "Mexico" "Nicaragua" "Colombia" "Dominican Republic" "El Salvador" "Guatemala"
[8] "Honduras" "Costa Rica" "Panama" "Argentina" "Chile" "Ecuador" "Peru"
[15] "Venezuela" "Other South America" "Haiti" "Jamaica" "West Indies" "Philippines" "Vietnam"
[22] "Laos" "Cambodia" "Hmong" "Other Asia" "China" "Hong Kong" "Taiwan"
[29] "Japan" "Korea" "India" "Pakistan" "Middle East/Africa" "Europe/Canada" "South America"
[36] "Asia" "Europe_Canada" "Middle East & Africa"
Upvotes: 1
Views: 91
Reputation: 28441
You can create a list with all of your countries and continents then reassign the values accordingly:
continents <- list(`South America`=SA_countries,
`North America` = NA_countries,
Europe=Euro_countries)
levels(df$V21) <- c(levels(df$V21), names(continents)) #necessary to add new levels
for(i in seq_along(continents)) {
df$V21[df$V21 %in% continents[[i]]] <- names(continents)[i]}
Reproducible Example
set.seed(123)
SA_countries <- c("Cuba","Colombia","Costa Rica","Argentina","Chile","Ecuador","Peru","Venezuela")
NA_countries <- c("Mexico", "USA", "Canada")
Euro_countries <- c("Germany", "France")
df <- data.frame(V21=sample(c(NA_countries,SA_countries, Europe),20,T))
df
# V21
# 1 Cuba
# 2 Venezuela
# 3 Costa Rica
# 4 Germany
# 5 France
# 6 Mexico
# 7 Argentina
# 8 Germany
# 9 Chile
# 10 Costa Rica
# 11 France
# 12 Costa Rica
# 13 Ecuador
# 14 Chile
# 15 USA
# 16 Germany
# 17 Cuba
# 18 Mexico
# 19 Colombia
# 20 France
continents <- list(`South America`=SA_countries, `North America` = NA_countries, Europe=Euro_countries)
levels(df$V21) <- c(levels(df$V21), names(continents))
for(i in seq_along(continents)) df$V21[df$V21 %in% continents[[i]]] <- names(continents)[i]
df
# V21
# 1 South America
# 2 South America
# 3 South America
# 4 Europe
# 5 Europe
# 6 North America
# 7 South America
# 8 Europe
# 9 South America
# 10 South America
# 11 Europe
# 12 South America
# 13 South America
# 14 South America
# 15 North America
# 16 Europe
# 17 South America
# 18 North America
# 19 South America
# 20 Europe
Upvotes: 1