Azoff
Azoff

Reputation: 13

Aggregate factors in Variable in R

I have this data.frame with a variable V21 in which many countries are recorded, I want to make it smaller by just specifying the continent rather then all those countries. For example 'Cuba', 'Peru', 'Argentina' rather than being separate levels of V21, I want them to become level 'South America'. Here's the code I tried to use:

recode(WaveOne.test$V21, "levels("Cuba","Colombia","Costa Rica","Argentina","Chile","Ecuador","Peru","Venezuela")= 'South America'")

levels(V21)

Can you suggest what is wrong with my code or maybe a different method? I am a complete newbie in R and its syntax. Thank you!

========UPDATE=========

SA_countries <- c("Cuba", "Mexico", "Argentina","Jamaica", "Haiti","West Indies", "Chile", "Ecuador", "Venezuela", "Other South America", "El Salvador", "Guatemala", "Nicaragua", "Dominican Republic", "Panama", "Costa Rica", "Peru")

Asia_countries <- c("Philippines", "Vietnam", "Laos", "Cambodia", "Hmong", "Other Asia", "China", "Hong Kong", "Taiwan", "Japan", "Korea", "India", "Pakistan") Europe_Canada <- c("Europe/Canada") MiddleEast_Africa <- c("Middle East/Africa")

continents <- list(`South America`= SA_countries, `Asia` = Asia_countries, `Europe_Canada` = Europe_Canada, `Middle East & Africa` = MiddleEast_Africa)
levels(WaveOne.test$V21) <- c(levels(WaveOne.test$V21), names(continents))
for(i in seq_along(continents)) WaveOne.test$V21[WaveOne.test$V21 %in%        continents[[i]]] <- names(continents)[i]

levels(WaveOne.test$V21)

My output however is:

levels(WaveOne.test$V21)

1 "Cuba" "Mexico" "Nicaragua" "Colombia" "Dominican Republic" "El Salvador" "Guatemala"
[8] "Honduras" "Costa Rica" "Panama" "Argentina" "Chile" "Ecuador" "Peru"
[15] "Venezuela" "Other South America" "Haiti" "Jamaica" "West Indies" "Philippines" "Vietnam"
[22] "Laos" "Cambodia" "Hmong" "Other Asia" "China" "Hong Kong" "Taiwan"
[29] "Japan" "Korea" "India" "Pakistan" "Middle East/Africa" "Europe/Canada" "South America"
[36] "Asia" "Europe_Canada" "Middle East & Africa"

Upvotes: 1

Views: 91

Answers (1)

Pierre L
Pierre L

Reputation: 28441

You can create a list with all of your countries and continents then reassign the values accordingly:

continents <- list(`South America`=SA_countries, 
                   `North America` = NA_countries, 
                    Europe=Euro_countries)
levels(df$V21) <- c(levels(df$V21), names(continents)) #necessary to add new levels
for(i in seq_along(continents)) {
df$V21[df$V21 %in% continents[[i]]] <- names(continents)[i]}

Reproducible Example

set.seed(123)
SA_countries <- c("Cuba","Colombia","Costa Rica","Argentina","Chile","Ecuador","Peru","Venezuela")
NA_countries <- c("Mexico", "USA", "Canada")
Euro_countries <- c("Germany", "France")
df <- data.frame(V21=sample(c(NA_countries,SA_countries, Europe),20,T))
df
#           V21
# 1        Cuba
# 2   Venezuela
# 3  Costa Rica
# 4     Germany
# 5      France
# 6      Mexico
# 7   Argentina
# 8     Germany
# 9       Chile
# 10 Costa Rica
# 11     France
# 12 Costa Rica
# 13    Ecuador
# 14      Chile
# 15        USA
# 16    Germany
# 17       Cuba
# 18     Mexico
# 19   Colombia
# 20     France

continents <- list(`South America`=SA_countries, `North America` = NA_countries, Europe=Euro_countries)
levels(df$V21) <- c(levels(df$V21), names(continents))
for(i in seq_along(continents)) df$V21[df$V21 %in% continents[[i]]] <- names(continents)[i]
df
#              V21
# 1  South America
# 2  South America
# 3  South America
# 4         Europe
# 5         Europe
# 6  North America
# 7  South America
# 8         Europe
# 9  South America
# 10 South America
# 11        Europe
# 12 South America
# 13 South America
# 14 South America
# 15 North America
# 16        Europe
# 17 South America
# 18 North America
# 19 South America
# 20        Europe

Upvotes: 1

Related Questions