Replacing levels of multiple factors

Question

I would need to replace the levels of multiple factors in one data frame, so they would be all unified. These are, for example, the levels in the one of those factors:

> levels(workco[,5])
 [1] " "                              "1"                              "2"                             
 [4] "kÃ³ko"                          "kesÃ¤tyÃ¶"                      "KesÃ¤tyÃ¶ kokoaika"            
 [7] "koko"                           "kokop"                          "kokop."                        
[10] "KokopÃ¤ivÃ¤"                    "kokopÃ¤ivÃ¤inen"                "KokopÃ¤ivÃ¤inen"               
[13] "kokopÃ¤ivÃ¤inen / osa-aikainen" "kokopÃ¤ivÃ¤nen"                 "kokp"                          
[16] "kokp."                          "Kokp."                          "osa-aik"                       
[19] "Osa-aik / KokopÃ¤iv."           "osa-aik."                       "Osa-aik."                      
[22] "osa-aikainen"                   "Osa-aikainen"                   "osa-aikainen/kokopÃ¤ivÃ¤inen"  
[25] "Osa/kokoaikainen"               "Osap."

Let's say I have 12 columns that are all factors, and these have different level names referring to the same meaning expressed differently: as you can see from the example, many of them show the same letters within the level names: koko, kok, kokop... There are three levels I want to obtain by unifying: kokop, osa and kes. Also the levels named after numbers 1 and 2 should be recoded into kokop and osa, respectively.

So far the things I have tried don't work out, I am afraid it's because I thinking in a more complicated way than it actually is: I have tried loops using the adist() function and also grep() separately, but I get find errors. For example:

code <- c("kok","osa","ma","kes",1,2," ")
list.names <- c("1", "2", "3", "4", "5", "6","7","8","9","10","11","12")
mylist <- vector("list", length(list.names))
names(mylist) <- list.names
D <- mylist
index <- mylist

for (i in ncol(workco2)){                            
  D[[i]] <- adist(workco2[,i],code,ignore.case=TRUE)
  index[[i]] <- lapply(D[[i]],which.min)
  workco2[,i] <- data.frame(code[index[[i]]])
}

And this error message:

Error in code[index[[i]]] : invalid subscript type 'list'

Could you be so kind to hint me how you would solve it? Probably is much simpler than I think =/ Thanks beforehand!

Ruthger Righart · Accepted Answer

It is my guess that you need a combination of grep & replace. This may speed-up changing levels with similar syllables ("ko", "kok").

Data example

code <- as.factor(c("kok","osa","ma","kes", "koko", "osa-aikainen", "osa/kes"))

Add level

levels(code) <- c(levels(code), "kokop")

Replace all instances containing "kok" with "kokop"

new.code <- replace(code, (grep ("kok", code)), "kokop")

Replace all instances containing "osa/kes" with "kes"

new.code <- replace(code, (grep ("osa/kes", code)), "kes")

Use shorter strings, for ex. "ko", to change levels with similar syllables ("ko", "kok")

new.code <- replace(code, (grep ("ko", code)), "kokop")

Replacing levels of multiple factors

Answers (2)

Related Questions