Reputation: 2326
I currently have a data frame with one factor and multiple logical variables (that correspond to some extent to various conditions of a single variable, but are NOT excluding one another).
Taking a simplistic comparison, I want to count cars from different types of people, but someone can have various cars. And let's say I also want to count their phone type (also not excluding one another).
Dummy data:
data = data.frame(Profession = c("Manager", "Manager", "Developer", "Developer"), Ford = c(T, F, F, T), Renault = c(F, T, T, F), Ferrari = c(T, F, F, F), iPhone = c(T, T, T, F), Android = c(T, T, F, T))
# Profession Ford Renault Ferrari iPhone Android
# 1 Manager TRUE FALSE TRUE TRUE TRUE
# 2 Manager FALSE TRUE FALSE TRUE TRUE
# 3 Developer FALSE TRUE FALSE TRUE FALSE
# 4 Developer TRUE FALSE FALSE FALSE TRUE
I'd like to obtain a contingency table with the count of Car and Phone types by Profession. Of course, I am not interested in all the FALSE
(or NA
) values.
Ideally, I'd like to present it in a table with hierarchical structure of variables, such as this:
Manager Developer (Total)
Car
- Ford 1 1 2
- Renault 1 1 2
- Ferrari 1 0 1
Phone
- iPhone 2 1 3
- Android 2 1 3
I have tried to mess around with table
but I must confess I am quite lost and don't know where to begin.
Upvotes: 2
Views: 2234
Reputation: 25854
You can also do this with the reshape2
package.
recast(dat, variable ~ Profession, id.var = 1, fun=sum, margins="Profession")
# variable Developer Manager (all)
# 1 Ford 1 1 2
# 2 Renault 1 1 2
# 3 Ferrari 0 1 1
# 4 iPhone 1 2 3
# 5 Android 1 2 3
recast
does this in one step, but to see why the variable names are in the formula have a look at
melt(dat, 1)
and then
dcast(melt(dat, 1), variable ~ Profession, value.var='value', fun=sum)
Upvotes: 3
Reputation: 3176
This should work:
# split the data by profession, result is a list with a dataframe for every profession
data2 = split(data[, -1], data$Profession)
# colSums is then equal to the frequencies per Ford, Renault, etc.
# that is binded into a dataframe for convenience
tb = data.frame(lapply(data2, colSums))
# add a column for total
tb$Total = rowSums(tb)
Upvotes: 2