texb
texb

Reputation: 547

Is there a general inverse of the table() function?

I am aware that a little programming allows converting fixed-dimension frequency tables, as returned e.g. by table(), back into observation data. So the aim is to convert a frequency table such as this one...

(flower.freqs <- with(iris,table(Petal=cut(Petal.Width,2),Species)))
          Species
Petal          setosa versicolor virginica
  (0.0976,1.3]     50         28         0
  (1.3,2.5]         0         22        50

...back into a data.frame() with a row number that corresponds to the sum of the numbers of the input matrix, while the cell values are obtained from input dimensions:

     Petal Species
1 (0.0976,1.3]  setosa
2 (0.0976,1.3]  setosa
3 (0.0976,1.3]  setosa
# ... (150 rows) ...

With some tinkering I build a rough prototype that should also digest higher-dimensional inputs:

tableinv <- untable <- function(x) {
    stopifnot(is.table(x))
    obs <- as.data.frame(x)[rep(1:prod(dim(x)),c(x)),-length(dim(x))-1]
    rownames(obs) <- NULL; obs
}

> head(tableinv(flower.freqs)); dim(tableinv(flower.freqs))
     Petal Species
1 (0.0976,1.3]  setosa
2 (0.0976,1.3]  setosa
3 (0.0976,1.3]  setosa
4 (0.0976,1.3]  setosa
5 (0.0976,1.3]  setosa
6 (0.0976,1.3]  setosa
[1] 150   2
> head(tableinv(Titanic)); nrow(tableinv(Titanic))==sum(Titanic)
  Class  Sex   Age Survived
1   3rd Male Child       No
2   3rd Male Child       No
3   3rd Male Child       No
4   3rd Male Child       No
5   3rd Male Child       No
6   3rd Male Child       No
[1] TRUE

I am obviously proud that this bricolage reconstructs multi-attribute data.frame()s from higher-dimensional frequency tables such as Titanic - but is there an established (built-in, battle-tested) general inverse to table(), ideally one that does not depend on a specific library, that knows how to handle unlabeled dimensions, that is optimized so that it will not choke on bulky inputs, and that reasonably deals with table inputs that would correspond to factor as well as non-factor observation inputs?

Upvotes: 9

Views: 2464

Answers (2)

agenis
agenis

Reputation: 8377

In the specific case where we deal with one-dimension frequency data, there is an easy way. Let's take an example:

mytable = table(mtcars$cyl)
####  4  6  8 
#### 11  7 14 

A simple function to retrieve expanded data:

InvTable = function(tb, random = TRUE){
  output = rep(names(tb), tb)
  if (random) { output <- base::sample(output, replace=FALSE) }
  return(output)
}
InvTable(mytable, T)
#### [1] "4" "8" "8" "4" "4" "6" "6" ...

This is not exactly the need of the user, but I think it could be very helpful in many similar cases. Just beware that the result is in character format, which is not always what we need (so add a as.numeric if needed).

Upvotes: 0

RHertel
RHertel

Reputation: 23788

I believe that your solution is pretty good. In any case, the way I would address this question is quite similar:

tableinv <- function(x){
      y <- x[rep(rownames(x),x$Freq),1:(ncol(x)-1)]
      rownames(y) <- c(1:nrow(y))
      return(y)}
survivors <- as.data.frame(Titanic)
surv.invtab <- tableinv(survivors)

which yields

> head(surv.invtab)
  Class  Sex   Age Survived
1   3rd Male Child       No
2   3rd Male Child       No
3   3rd Male Child       No
4   3rd Male Child       No
5   3rd Male Child       No
6   3rd Male Child       No

Concerning the example with the flowers, using the function tableinv() as defined above, it would first be necessary to convert the data into a data frame:

flower.freqs <- with(iris,table(Petal=cut(Petal.Width,2),Species))
flower.freqs <- as.data.frame(flower.freqs)
flower.invtab <- tableinv(flower.freqs)

The result in this case is

> head(flower.invtab)
         Petal Species
1 (0.0976,1.3]  setosa
2 (0.0976,1.3]  setosa
3 (0.0976,1.3]  setosa
4 (0.0976,1.3]  setosa
5 (0.0976,1.3]  setosa
6 (0.0976,1.3]  setosa

Hope this helps.

Upvotes: 2

Related Questions