Reputation: 75585
Suppose I want to compare two lists of equal size using the frequency of values in each list.
Consider the following script.
foo = c(24,24,24,3,10,2)
bar = c(24,24,10,3,3,2)
summary(as.factor(foo))
summary(as.factor(bar))
summary(as.factor(foo)) - summary(as.factor(bar))
As long as the set of discrete values in foo
and bar
are identical, this works reasonably well. Here is some output:
2 3 10 24
1 1 1 3
2 3 10 24
1 2 1 2
2 3 10 24
0 -1 0 1
However, if there is some value in bar
which is not in foo
, then, we get the undesirable default behavior of recycling the shorter vector and also a mismatch of counts. Consider the case where
bar = c(24,24,11,3,10,2)
Then, our output looks like this, along with a warning message.
2 3 10 24
1 1 1 3
2 3 10 11 24
1 1 1 1 2
2 3 10 11 24
0 0 0 2 -1
Warning message:
In summary(as.factor(foo)) - summary(as.factor(bar)) :
longer object length is not a multiple of shorter object length
The desired output is:
2 3 10 11 24
0 0 0 -1 1
In particular, note that a 0
has been filled for the missing 11
in foo
, and that the value for 24
is 3 - 2 = 1
.
How can I achieve the desired output?
Upvotes: 4
Views: 57
Reputation: 6532
i have no idea why you want to do this:
> summary(factor(foo, levels=union(foo,bar))) - summary(factor(bar, levels=union(foo,bar)))
24 3 10 2 11
1 0 0 0 -1
Upvotes: -1
Reputation: 1048
You can achieve this by using the same levels when turning foo
and bar
into factors.
> foo2 = factor(foo, levels=sort(union(foo, bar)))
> bar2 = factor(bar, levels=sort(union(foo, bar)))
> summary(foo2) - summary(bar2)
2 3 10 11 24
0 0 0 -1 1
Upvotes: 4