R: How to find percentage of occurrences for a category

Question

I have a data frame with 3 columns: Id number, Category, and brand:

X1        X2        X3
1234      Couch     A87   
1234      Chair     A18
1234      Chair     A19
1234      Chair     A21
1235      Couch     A89
1235      Couch     A89
1235      Couch     A89
1236      Chair     A21
1236      Chair     A21
1236      Chair     A21

What I want to do is return the percentage of id numbers that have only 1 brand for a category.

I have tried to get a data frame that shows the count of brands per each id number like this:

X1        X2        total.count
1234      Couch     1
1234      Chair     3
1235      Couch     1
1236      Chair     1

Using:

DF2 = (DF %>%
group_by(X2,X3) %>%
  summarise(total.count=n()))

But it isn't working correctly. I would then want to filter out all of the id numbers that have counts more than 1. Any thoughts?

Kristofersen · Accepted Answer

I think data.table is the easiest way to solve something like this.

dat = read.table("clipboard", header = TRUE)

library(data.table)
setDT(dat)

Then, to get the output you're looking for we could do this

> dat[ , .(total.count = uniqueN(X3)), by = .(X1, X2)]
     X1    X2 total.count
1: 1234 Couch           1
2: 1234 Chair           3
3: 1235 Couch           1
4: 1236 Chair           1

To limit this to ID numbers with counts of 1 we could do this

dat2 = dat[ , .(total.count = uniqueN(X3)), by = .(X1, X2)]

> dat2[total.count == 1]
     X1    X2 total.count
1: 1234 Couch           1
2: 1235 Couch           1
3: 1236 Chair           1

Or, if we wanted to do this without saving dat2, we could do this all in 1 line

> dat[ , .(total.count = uniqueN(X3)), by = .(X1, X2)][total.count == 1]
     X1    X2 total.count
1: 1234 Couch           1
2: 1235 Couch           1
3: 1236 Chair           1

R: How to find percentage of occurrences for a category

Answers (2)

Related Questions