Reputation: 547
I am aware that a little programming allows converting fixed-dimension frequency tables, as returned e.g. by table()
, back into observation data. So the aim is to convert a frequency table such as this one...
(flower.freqs <- with(iris,table(Petal=cut(Petal.Width,2),Species)))
Species
Petal setosa versicolor virginica
(0.0976,1.3] 50 28 0
(1.3,2.5] 0 22 50
...back into a data.frame()
with a row number that corresponds to the sum of the numbers of the input matrix, while the cell values are obtained from input dimensions:
Petal Species
1 (0.0976,1.3] setosa
2 (0.0976,1.3] setosa
3 (0.0976,1.3] setosa
# ... (150 rows) ...
With some tinkering I build a rough prototype that should also digest higher-dimensional inputs:
tableinv <- untable <- function(x) {
stopifnot(is.table(x))
obs <- as.data.frame(x)[rep(1:prod(dim(x)),c(x)),-length(dim(x))-1]
rownames(obs) <- NULL; obs
}
> head(tableinv(flower.freqs)); dim(tableinv(flower.freqs))
Petal Species
1 (0.0976,1.3] setosa
2 (0.0976,1.3] setosa
3 (0.0976,1.3] setosa
4 (0.0976,1.3] setosa
5 (0.0976,1.3] setosa
6 (0.0976,1.3] setosa
[1] 150 2
> head(tableinv(Titanic)); nrow(tableinv(Titanic))==sum(Titanic)
Class Sex Age Survived
1 3rd Male Child No
2 3rd Male Child No
3 3rd Male Child No
4 3rd Male Child No
5 3rd Male Child No
6 3rd Male Child No
[1] TRUE
I am obviously proud that this bricolage reconstructs multi-attribute data.frame()
s from higher-dimensional frequency tables such as Titanic
- but is there an established (built-in, battle-tested) general inverse to table(), ideally one that does not depend on a specific library, that knows how to handle unlabeled dimensions, that is optimized so that it will not choke on bulky inputs, and that reasonably deals with table inputs that would correspond to factor as well as non-factor observation inputs?
Upvotes: 9
Views: 2464
Reputation: 8377
In the specific case where we deal with one-dimension frequency data, there is an easy way. Let's take an example:
mytable = table(mtcars$cyl)
#### 4 6 8
#### 11 7 14
A simple function to retrieve expanded data:
InvTable = function(tb, random = TRUE){
output = rep(names(tb), tb)
if (random) { output <- base::sample(output, replace=FALSE) }
return(output)
}
InvTable(mytable, T)
#### [1] "4" "8" "8" "4" "4" "6" "6" ...
This is not exactly the need of the user, but I think it could be very helpful in many similar cases. Just beware that the result is in character format, which is not always what we need (so add a as.numeric if needed).
Upvotes: 0
Reputation: 23788
I believe that your solution is pretty good. In any case, the way I would address this question is quite similar:
tableinv <- function(x){
y <- x[rep(rownames(x),x$Freq),1:(ncol(x)-1)]
rownames(y) <- c(1:nrow(y))
return(y)}
survivors <- as.data.frame(Titanic)
surv.invtab <- tableinv(survivors)
which yields
> head(surv.invtab)
Class Sex Age Survived
1 3rd Male Child No
2 3rd Male Child No
3 3rd Male Child No
4 3rd Male Child No
5 3rd Male Child No
6 3rd Male Child No
Concerning the example with the flowers, using the function tableinv()
as defined above, it would first be necessary to convert the data into a data frame:
flower.freqs <- with(iris,table(Petal=cut(Petal.Width,2),Species))
flower.freqs <- as.data.frame(flower.freqs)
flower.invtab <- tableinv(flower.freqs)
The result in this case is
> head(flower.invtab)
Petal Species
1 (0.0976,1.3] setosa
2 (0.0976,1.3] setosa
3 (0.0976,1.3] setosa
4 (0.0976,1.3] setosa
5 (0.0976,1.3] setosa
6 (0.0976,1.3] setosa
Hope this helps.
Upvotes: 2