Mary A. Marion
Mary A. Marion

Reputation: 800

Counting the number of factor variables in a data frame

After generating data, I combined 5 variables into a data frame. Two of those variables are factors.

Task: I want to count the number of variables in the data frame that are factors.

I ran the code letting df equal both a matrix and a data frame. I'm listing both error messages.

I need help in using rep function-where it's located in the R command in particular. Is using the count function the correct approach here and if not what should I do?

Can you help with this, please. Thank you. MM

XXX's mark questions in the output

> df
           var1 var2       var3 var4       var5
[1,] -1.2070657    1 -0.6319780    3 -0.9952502
[2,]  0.2774292    2  0.3485368    1  1.9176811
[3,]  1.0844412    3  0.2075986    2  0.8032506
> class(df)
[1] "matrix"

> library(plyr)
> count(df[1:5,],as.factor)
Error in df[1:5, ] : subscript out of bounds
> df
           var1 var2       var3 var4       var5
[1,] -1.2070657    1 -0.6319780    3 -0.9952502
[2,]  0.2774292    2  0.3485368    1  1.9176811
[3,]  1.0844412    3  0.2075986    2  0.8032506
> #Error in df[1:5, ] : subscript out of bounds  df=matrix
no applicable method for 'as.quoted' applied to 
an object of class "function" df=dataframe
                                            XXXXXXXXXXXXXXXXXXX

> #2]
> 
> #working example
> b=c(1,2,3,4,5,3,6)
> #Let’s count the 3s in the vector b.
> count3 <- length(which(b == 3))
> count3
[1] 2

> 
> #apply the technique
> vec=c("var1","var2","var3","var4","var5")
> countF <- length(which(var1==as.factor))
Error in var1 == as.factor : 
  comparison (1) is possible only for atomic and list types  XXXXXXXX

> #apply the technique again
> #count the number of variables that are factors in vec
> #var2 and var4 are factors
> vec=c("var1","var2","var3","var4","var5")
> countF <- length(which(vec==as.factor))
Error in vec == as.factor : 
  comparison (1) is possible only for atomic and list types
                                            XXXXXXXXXXXXXXXXXXX

I had changed columns 2 and 4 to be factors prior to cbinding but in that process columns 2 and 4 reverted back to being numeric. I used as.factor trying to get the code to run. As I read over comments I wondered why lapply would not be appropriate since were dealing with an array of variable names in a list. Do all of the apply functions return TRUE's or FALSE's? I'm still learning when to apply each of them.

MM

Upvotes: 0

Views: 2819

Answers (2)

Dylan_Gomes
Dylan_Gomes

Reputation: 2242

A few problems here:

Your subscript is out of bounds problem is because df[1:5, ] is rows 1:5, whereas columns would be df[ ,1:5]. It appears that you only have 3 rows, not 5.

The second error no applicable method for 'as.quoted' applied to an object of class "function" is referring to the as.factor, which is a function. It is saying that a function doesn't belong within the function count. You can check exactly what count wants by running ?count in the console

A third problem that I see is that R will not automatically think that integers are factors. You will have to specify this with numbers. If you read in words, they are often automatically set as factors.

Here is a reproducible example:

> df<-data.frame("var1"=rnorm(3),"var2"=c(1:3),"var3"=rnorm(3),"var4"=c(3,1,2),"var5"=rnorm(3))
> str(df)

'data.frame':   3 obs. of  5 variables:
 $ var1: num  0.716 1.43 -0.726
 $ var2: int  1 2 3
 $ var3: num  0.238 -0.658 0.492
 $ var4: num  3 1 2
 $ var5: num  1.71 1.54 1.05

Here I used the structure str() function to check what type of data I have. Note, var1 is read in as an integer when I generated it as c(1:3), whereas specifying c(3,1,2) was read in as numeric in var4

Here, I will tell R I want two of the columns to be factors, and I will make another column of words, which will automatically become factors.

> df<-data.frame("var1"=rnorm(3),"var2"=as.factor(c(1:3)),"var3"=rnorm(3),"var4"=as.factor(c(3,1,2))
+                ,"var5"=rnorm(3), "var6"=c("Green","Red","Blue"))
> str(df)
'data.frame':   3 obs. of  6 variables:
 $ var1: num  -1.18 1.26 -0.53
 $ var2: Factor w/ 3 levels "1","2","3": 1 2 3
 $ var3: num  1.38 -0.401 -0.924
 $ var4: Factor w/ 3 levels "1","2","3": 3 1 2
 $ var5: num  1.688 0.547 0.727
 $ var6: Factor w/ 3 levels "Blue","Green",..: 2 3 1

You can then as which are factors:

> sapply(df, is.factor)
 var1  var2  var3  var4  var5  var6 
FALSE  TRUE FALSE  TRUE FALSE  TRUE 

And if you wanted a number for how many are factors something like this would get you there:

> length(which(sapply(df, is.factor)==TRUE))
[1] 3

You have something similar: length(which(vec==as.factor)), but one problem with this is you are asking which things in the vec object are the same as a function as.factor, which doesn't make sense. So it is giving you the error Error in vec == as.factor : comparison (1) is possible only for atomic and list types

as.factor is for setting things as factor (as I have shown above), but is.factor is for asking if something is a factor, which will return a logical (TRUE vs FALSE) - also shown above.

Upvotes: 0

slava-kohut
slava-kohut

Reputation: 4233

If you want to count the number of factor variables, you can use sapply combined with is.factor:

sum(sapply(df, is.factor))

where df is your target data frame.

Upvotes: 1

Related Questions