anrpet
anrpet

Reputation: 139

R: How to find percentage of occurrences for a category

I have a data frame with 3 columns: Id number, Category, and brand:

X1        X2        X3
1234      Couch     A87   
1234      Chair     A18
1234      Chair     A19
1234      Chair     A21
1235      Couch     A89
1235      Couch     A89
1235      Couch     A89
1236      Chair     A21
1236      Chair     A21
1236      Chair     A21

What I want to do is return the percentage of id numbers that have only 1 brand for a category.

I have tried to get a data frame that shows the count of brands per each id number like this:

X1        X2        total.count
1234      Couch     1
1234      Chair     3
1235      Couch     1
1236      Chair     1 

Using:

DF2 = (DF %>%
group_by(X2,X3) %>%
  summarise(total.count=n()))

But it isn't working correctly. I would then want to filter out all of the id numbers that have counts more than 1. Any thoughts?

Upvotes: 1

Views: 350

Answers (2)

Kristofersen
Kristofersen

Reputation: 2806

I think data.table is the easiest way to solve something like this.

dat = read.table("clipboard", header = TRUE)

library(data.table)
setDT(dat)

Then, to get the output you're looking for we could do this

> dat[ , .(total.count = uniqueN(X3)), by = .(X1, X2)]
     X1    X2 total.count
1: 1234 Couch           1
2: 1234 Chair           3
3: 1235 Couch           1
4: 1236 Chair           1

To limit this to ID numbers with counts of 1 we could do this

dat2 = dat[ , .(total.count = uniqueN(X3)), by = .(X1, X2)]

> dat2[total.count == 1]
     X1    X2 total.count
1: 1234 Couch           1
2: 1235 Couch           1
3: 1236 Chair           1

Or, if we wanted to do this without saving dat2, we could do this all in 1 line

> dat[ , .(total.count = uniqueN(X3)), by = .(X1, X2)][total.count == 1]
     X1    X2 total.count
1: 1234 Couch           1
2: 1235 Couch           1
3: 1236 Chair           1

Upvotes: 2

AkselA
AkselA

Reputation: 8846

df <- read.table(text="
   X1        X2       X3
   1234      Couch     A87   
   1234      Chair     A18
   1234      Chair     A19
   1234      Chair     A21
   1235      Couch     A89
   1235      Couch     A89
   1235      Couch     A89
   1236      Chair     A21
   1236      Chair     A21
   1236      Chair     A21", header=TRUE)

agg <- aggregate(X3 ~ X1 + X2, df, function(x) length(x) > 1)

mean(agg$X3) * 100
# 75

just_one <- agg$X1[!agg$X3]

df[df$X1 == just_one, ]
#     X1    X2  X3
# 1 1234 Couch A87
# 2 1234 Chair A18
# 3 1234 Chair A19
# 4 1234 Chair A21

Upvotes: 1

Related Questions