Reputation: 800
After generating data, I combined 5 variables into a data frame. Two of those variables are factors.
Task: I want to count the number of variables in the data frame that are factors.
I ran the code letting df equal both a matrix and a data frame. I'm listing both error messages.
I need help in using rep function-where it's located in the R command in particular. Is using the count function the correct approach here and if not what should I do?
Can you help with this, please. Thank you. MM
XXX's mark questions in the output
> df
var1 var2 var3 var4 var5
[1,] -1.2070657 1 -0.6319780 3 -0.9952502
[2,] 0.2774292 2 0.3485368 1 1.9176811
[3,] 1.0844412 3 0.2075986 2 0.8032506
> class(df)
[1] "matrix"
> library(plyr)
> count(df[1:5,],as.factor)
Error in df[1:5, ] : subscript out of bounds
> df
var1 var2 var3 var4 var5
[1,] -1.2070657 1 -0.6319780 3 -0.9952502
[2,] 0.2774292 2 0.3485368 1 1.9176811
[3,] 1.0844412 3 0.2075986 2 0.8032506
> #Error in df[1:5, ] : subscript out of bounds df=matrix
no applicable method for 'as.quoted' applied to
an object of class "function" df=dataframe
XXXXXXXXXXXXXXXXXXX
> #2]
>
> #working example
> b=c(1,2,3,4,5,3,6)
> #Let’s count the 3s in the vector b.
> count3 <- length(which(b == 3))
> count3
[1] 2
>
> #apply the technique
> vec=c("var1","var2","var3","var4","var5")
> countF <- length(which(var1==as.factor))
Error in var1 == as.factor :
comparison (1) is possible only for atomic and list types XXXXXXXX
> #apply the technique again
> #count the number of variables that are factors in vec
> #var2 and var4 are factors
> vec=c("var1","var2","var3","var4","var5")
> countF <- length(which(vec==as.factor))
Error in vec == as.factor :
comparison (1) is possible only for atomic and list types
XXXXXXXXXXXXXXXXXXX
I had changed columns 2 and 4 to be factors prior to cbinding but in that process columns 2 and 4 reverted back to being numeric. I used as.factor trying to get the code to run. As I read over comments I wondered why lapply would not be appropriate since were dealing with an array of variable names in a list. Do all of the apply functions return TRUE's or FALSE's? I'm still learning when to apply each of them.
MM
Upvotes: 0
Views: 2819
Reputation: 2242
A few problems here:
Your subscript is out of bounds problem
is because df[1:5, ]
is rows 1:5, whereas columns would be df[ ,1:5]
. It appears that you only have 3 rows, not 5.
The second error no applicable method for 'as.quoted' applied to
an object of class "function"
is referring to the as.factor, which is a function. It is saying that a function doesn't belong within the function count
. You can check exactly what count
wants by running ?count
in the console
A third problem that I see is that R will not automatically think that integers are factors. You will have to specify this with numbers. If you read in words, they are often automatically set as factors.
Here is a reproducible example:
> df<-data.frame("var1"=rnorm(3),"var2"=c(1:3),"var3"=rnorm(3),"var4"=c(3,1,2),"var5"=rnorm(3))
> str(df)
'data.frame': 3 obs. of 5 variables:
$ var1: num 0.716 1.43 -0.726
$ var2: int 1 2 3
$ var3: num 0.238 -0.658 0.492
$ var4: num 3 1 2
$ var5: num 1.71 1.54 1.05
Here I used the structure str()
function to check what type of data I have. Note, var1
is read in as an integer when I generated it as c(1:3)
, whereas specifying c(3,1,2)
was read in as numeric in var4
Here, I will tell R I want two of the columns to be factors, and I will make another column of words, which will automatically become factors.
> df<-data.frame("var1"=rnorm(3),"var2"=as.factor(c(1:3)),"var3"=rnorm(3),"var4"=as.factor(c(3,1,2))
+ ,"var5"=rnorm(3), "var6"=c("Green","Red","Blue"))
> str(df)
'data.frame': 3 obs. of 6 variables:
$ var1: num -1.18 1.26 -0.53
$ var2: Factor w/ 3 levels "1","2","3": 1 2 3
$ var3: num 1.38 -0.401 -0.924
$ var4: Factor w/ 3 levels "1","2","3": 3 1 2
$ var5: num 1.688 0.547 0.727
$ var6: Factor w/ 3 levels "Blue","Green",..: 2 3 1
You can then as which are factors:
> sapply(df, is.factor)
var1 var2 var3 var4 var5 var6
FALSE TRUE FALSE TRUE FALSE TRUE
And if you wanted a number for how many are factors something like this would get you there:
> length(which(sapply(df, is.factor)==TRUE))
[1] 3
You have something similar: length(which(vec==as.factor))
, but one problem with this is you are asking which things in the vec
object are the same as a function as.factor
, which doesn't make sense. So it is giving you the error Error in vec == as.factor :
comparison (1) is possible only for atomic and list types
as.factor
is for setting things as factor (as I have shown above), but is.factor
is for asking if something is a factor, which will return a logical (TRUE vs FALSE) - also shown above.
Upvotes: 0
Reputation: 4233
If you want to count the number of factor variables, you can use sapply
combined with is.factor
:
sum(sapply(df, is.factor))
where df
is your target data frame.
Upvotes: 1