user11538509
user11538509

Reputation:

How to subset tables based on category value using variable name?

I try to subset a table based on one category value. Assume we want to keep only adults from the Titanic data. What I do is:

data("Titanic")
subset(Titanic, Age == "Adult")

This results in the error object 'Age' not found. Using same logic with dataframes works: subset(as.data.frame(Titanic), Age == "Adult"). But how can we subset tables directly, i.e. without transforming them to a dataframe?

EDIT Here Adult is dimension number three. In my case I do not know which dimension it is, i.e. I would like to be able to subset by variable name as in subset(Titanic, Age == "Adult"). It can be any other base function, i.e. I am not stuck with subset. But I am looking for a base R solution.

My expected output is

structure(c(118, 154, 387, 670, 4, 13, 89, 3, 57, 14, 75, 192, 140, 80, 76, 20), .Dim = c(4L, 2L, 2L), .Dimnames = list(Class = c("1st", "2nd", "3rd", "Crew"), Sex = c("Male", "Female"), Survived = c("No", "Yes")), class = "table")

Upvotes: 1

Views: 92

Answers (2)

zx8754
zx8754

Reputation: 56149

Get dimensions index by matching on dimnames, then use slice.index:

# user input
x = "Adult"

#get index
ix1 <- which(sapply(dimnames(Titanic), function(i) sum(i == x)) == 1)
ix2 <- which(dimnames(Titanic)[[ ix1 ]] == x)

#subset and restore dimensions
res <- Titanic[ slice.index(Titanic, ix1) == ix2 ]
dim(res) <- dim(Titanic)[ -ix1 ]

#test
all(Titanic[,,"Adult",] == res)
# [1] TRUE

# not identical as the names are missing
identical(Titanic[,,"Adult",], res)
# [1] FALSE

res
# , , 1
# 
#      [,1] [,2]
# [1,]  118    4
# [2,]  154   13
# [3,]  387   89
# [4,]  670    3
# 
# , , 2
# 
#      [,1] [,2]
# [1,]   57  140
# [2,]   14   80
# [3,]   75   76
# [4,]  192   20

Upvotes: 1

Yacine Hajji
Yacine Hajji

Reputation: 1449

You are not working on a 2 dimensional data-frame but on a 4 dimensional array.
Thus you must specify your condition in the right dimension, as follows:

Titanic[,,"Adult",]

When you display your array, you have the 4 following dimensions:
1- Class
2- Sex
3- Age
4- Survived

You can get the dimension names with "str()" or "dimnames()"

str(Titanic)
dimnames(Titanic)

Upvotes: 2

Related Questions