dreamfuleyes
dreamfuleyes

Reputation: 172

How can I compare two factors with different levels?

Is it possible to compare two factors of same length, but different levels? For example, if we have these 2 factor variables:

A <- factor(1:5)

str(A)
 Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5

B <- factor(c(1:3,6,6))

str(B)
 Factor w/ 4 levels "1","2","3","6": 1 2 3 4 4

If I try to compare them using, for example, the == operator:

mean(A == B)

I get the following error:

Error in Ops.factor(A, B) : level sets of factors are different

Upvotes: 4

Views: 8281

Answers (2)

divibisan
divibisan

Reputation: 12155

Converting to character as in @zx8754's answer is the easiest solution to this problem, and probably the one you'd want to use almost always. Another option, though, is to correct the 2 variables so that they have the same levels. You might want to do this if you want to keep these variables as factor for some reason and don't want to have to clog up your code with repeated calls to as.character.

A <- factor(1:5)
B <- factor(c(1:3,6,6))

mean(A == B)
Error in Ops.factor(A, B) : level sets of factors are different

We can take the union of the levels of both factors to get all levels in either factor, and then set remake the factors using that union as the levels. Now, even though the 2 factors have different values, the levels are the same between them and you can compare them:

C = factor(A, levels = union(levels(A), levels(B)))
D = factor(B, levels = union(levels(A), levels(B)))

mean(C==D)
[1] 0.6

As you can see, the values are unchanged, but the levels are now identical.

C
[1] 1 2 3 4 5
Levels: 1 2 3 4 5 6

D
[1] 1 2 3 6 6
Levels: 1 2 3 4 5 6

Upvotes: 1

zx8754
zx8754

Reputation: 56004

Convert to character then compare:

# data
A <- factor(1:5)
B <- factor(c(1:3,6,6))

str(A)
# Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5
str(B)
# Factor w/ 4 levels "1","2","3","6": 1 2 3 4 4

mean(A == B)

Error in Ops.factor(A, B) : level sets of factors are different

mean(as.character(A) == as.character(B))
# [1] 0.6

Or another approach would be

mean(levels(A)[A] == levels(B)[B])

which is 2 times slower on a 1e8 dataset.

Upvotes: 11

Related Questions