Misha
Misha

Reputation: 3126

setting levels inside lapply loop in r

I´m trying to clean the factor variables in a dataframe from trailing spaces. However the levels assignment doesnt work inside my lapply function.

rm.space<-function(x){
    a<-gsub(" ","",x)
    return(a)}


lapply(names(barn),function(x){
    levels(barn[,x])<-rm.space(levels(barn[,x]))
    })

Any ideas how I can assign levels inside a lapply function?

//M

Upvotes: 1

Views: 2704

Answers (3)

Marek
Marek

Reputation: 50704

As Joris states lapply works on local copy of data.frame, so it won't modify your original data. But you could use it to replace your data:

barn[] <- lapply(barn, function(x) {
    levels(x) <- rm.space(levels(x))
    x
    })

It is useful when you have different types in data and want to modify only factor's, e.g.:

factors <- sapply(barn, is.factor)
barn[factors] <- lapply(barn[factors], function(x) {
                    levels(x) <- rm.space(levels(x))
                    x
                 })

Upvotes: 0

Joris Meys
Joris Meys

Reputation: 108543

From your code I read that the lapply is used to loop over different variables, not over the levels of the factor. So then you do need some kind of looping structure, but lapply is a bad choice:

  • you loop over a vector -names(barn)- so it's better to use sapply
  • the apply family will return the result from each loop, something you don't want. So you're using memory without purpose.

Anyway, in case you need to assign something to a variable in your global environment within a lapply, you need the <<- operator. Say you need to have a number of variables you selected where the spaces have to be removed:

f <- paste("",letters[1:5])

Df <- data.frame(
    X1 = sample(f,10,r=T),
    X2 = sample(f,10,r=T),
    X3 = sample(f,10,r=T)
    )

# Bad example :   
lapply(c("X1","X3"),function(x){
    levels(Df[,x])<<-gsub(" +","",levels(Df[,x]))
    })

gives

> str(Df)
'data.frame':   10 obs. of  3 variables:
 $ X1: Factor w/ 3 levels "a","b","c": 2 3 1 1 1 2 3 2 2 2
 $ X2: Factor w/ 5 levels " a"," b"," c",..: 4 5 4 2 5 5 1 2 5 3
 $ X3: Factor w/ 5 levels "a","b","c","d",..: 2 3 4 1 4 1 3 3 5 4

Better is to use a for loop :

for( i in c("X1","X3")){
    levels(Df[,i])<-gsub(" +","",levels(Df[,i]))
}

Does what you need without the hassle of the <<- operator and without holding memory unnecessarily.

Upvotes: 1

Dirk is no longer here
Dirk is no longer here

Reputation: 368251

R is vectorised, you do not need apply():

> f <- as.factor(sample(c("  a", " b", "c", "  d"), 10, replace=TRUE))                                                                                                             
> levels(f)                                                                                                                                                                        
[1] "  a" " b"  "c"   "  d"                                                                                                                                                        
> levels(f) <- gsub(" +", "", levels(f), perl=TRUE)                                                                                                                                
> levels(f)                                                                                                                                                                        
[1] "a" "b" "c" "d"                                                                                                                                                                
> f                                                                                                                                                                                
 [1] d a c b c d d a a a                                                                                                                                                           
Levels: a b c d                                                                                                                                                                    
>

Upvotes: 6

Related Questions