MH765
MH765

Reputation: 3

Aggregating frequencies with multiple columns in R

I am working with a dataframe in R that has three columns: House, Appliance, and Count. The data is essentially an inventory of the different types of kitchen appliances contained within each house on a block. The data look something like this: (spaces added for illustrative purposes)

    House        Appliance        Count
    1            Toaster          2

    2            Dishwasher       1
    2            Toaster          1
    2            Refrigerator     1
    2            Toaster          1

    3            Dishwasher       1
    3            Oven             1

For each appliance type, I would like to be able to compute the proportion of houses containing at least one of those appliances. Note that in my data, it is possible for a single house to have zero, one, or multiple appliances in a single category. If a house does not have an appliance, it is not listed in the data for that house. If the house has more than one appliance, the appliance could be listed once with a count >1 (e.g., toasters in House 1), or it could be listed twice (each with count = 1, e.g., toasters in House 2).

As an example showing what I am trying to compute, in the data shown here, the proportion of houses with toasters would be .67 (rounded) because 2/3 of the houses have at least one toaster. Similarly, the proportion of houses with ovens would be 0.33 (since only 1/3 of the houses have an oven). I do not care that any of the houses have more than one toaster -- only that they have at least one.

I have fooled around with xtabs and ftable in R but am not confident that they provide the simplest solution. Part of the problem is that these functions will provide the number of appliances for each house, which then throws off my proportion of houses calculations. Here's my current approach:

    temp1 <- xtabs(~House + Appliance, data=housedata)
    temp1[temp1[,] > 1] <- 1  # This is needed to correct houses with >1 unit.
    proportion.of.houses <- data.frame(margin.table(temp1,2)/3)

This appears to work but it's not elegant. I'm guessing there is a better way to do this in R. Any suggestions much appreciated.

Upvotes: 0

Views: 114

Answers (2)

Henk
Henk

Reputation: 3656

library(data.table)
setDT(df)

n.houses = length(unique(df$House))
df[, length(unique(House))/n.houses, by = Appliance]

Upvotes: 1

mpalanco
mpalanco

Reputation: 13570

library(dplyr)
n <- length(unique(df$House)) 
df %>% 
  group_by(Appliance) %>% 
  summarise(freq = n_distinct(House)/n)

Output:

     Appliance      freq
1   Dishwasher 0.6666667
2         Oven 0.3333333
3 Refrigerator 0.3333333
4      Toaster 0.6666667

Upvotes: 1

Related Questions