Reputation: 2638
Suppose you have a data.frame
with a number of factors with varying numbers of levels:
V1<-factor(sample(c(1:5,9),100,TRUE))
V2<-factor(sample(c(1:5,9),100,TRUE))
V3<-factor(sample(c(1:5),100,TRUE))
V4<-factor(sample(c(1:5),100,TRUE))
dat<-data.frame(V1,V2,V3,V4)
The goal is to estimate the difference in level frequencies for two factors. However, due to different numbers of levels, the arrays from two tables based on V1/V2 and V3/V4 are not conformable, e.g.:
table(dat$V1)-table(dat$V3)
Error in table(dat$V1) - table(dat$V3) : non-conformable arrays
The goal is to make V3 and V4 conformable so that the operation is valid. One option is:
dat$V3<-factor(dat$V3,levels=c('1','2','3','4','5','9')
However, it requires setting the factor levels for each variable and this is impractical for many variables V5,...,Vn, say. I thought
dat[,3:4]<-apply(dat[,3:4],2,factor,levels=c('1','2','3','4','5','9'))
might work in more general terms, but is.factor(dat$V3)
is FALSE then.
EDIT: This function might complete the answer by SimonO101:
correct_factors<-function(df_object,range){
if(is.data.frame(df_object)==FALSE){stop('Requires data.frame object')}
levs <- unique( unlist( lapply( df_object[,range[1]:range[2]] , levels ) ) )
df_object[,range[1]:range[2]] <-
data.frame( lapply( df_object[,range[1]:range[2]] , factor , levels = levs ) )
return(df_object)
}
Upvotes: 3
Views: 11648
Reputation: 59970
Try this to harmonise the levels...
# Get vector of all levels that appear in the data.frame
levs <- unique( unlist( lapply( dat , levels ) ) )
# Set these as the levels for each column
dat2 <- data.frame( lapply( dat , factor , levels = levs ) )
table(dat2$V1)-table(dat2$V3)
# 1 2 3 4 5 9
#-15 -5 4 7 -5 14
Upvotes: 4