Is it possible to auto-fill with zeroes when subtracting output of two calls to `summary` on factors?

Question

Suppose I want to compare two lists of equal size using the frequency of values in each list.

Consider the following script.

foo = c(24,24,24,3,10,2)
bar =  c(24,24,10,3,3,2)

summary(as.factor(foo))
summary(as.factor(bar))


summary(as.factor(foo)) -  summary(as.factor(bar))

As long as the set of discrete values in foo and bar are identical, this works reasonably well. Here is some output:

However, if there is some value in bar which is not in foo, then, we get the undesirable default behavior of recycling the shorter vector and also a mismatch of counts. Consider the case where

bar =  c(24,24,11,3,10,2)

Then, our output looks like this, along with a warning message.

 2  3 10 24 
 1  1  1  3 

 2  3 10 11 24 
 1  1  1  1  2 

 2  3 10 11 24 
 0  0  0  2 -1 
Warning message:
In summary(as.factor(foo)) - summary(as.factor(bar)) :
  longer object length is not a multiple of shorter object length

The desired output is:

 2  3 10 11 24 
 0  0  0  -1 1

In particular, note that a 0 has been filled for the missing 11 in foo, and that the value for 24 is 3 - 2 = 1 .

How can I achieve the desired output?

Nick · Accepted Answer

You can achieve this by using the same levels when turning foo and bar into factors.

> foo2 = factor(foo, levels=sort(union(foo, bar)))
> bar2 = factor(bar, levels=sort(union(foo, bar)))
> summary(foo2) - summary(bar2)
 2  3 10 11 24 
 0  0  0 -1  1

Is it possible to auto-fill with zeroes when subtracting output of two calls to `summary` on factors?

Answers (2)

Related Questions