Reputation: 139
I have a data frame with 3 columns: Id number, Category, and brand:
X1 X2 X3
1234 Couch A87
1234 Chair A18
1234 Chair A19
1234 Chair A21
1235 Couch A89
1235 Couch A89
1235 Couch A89
1236 Chair A21
1236 Chair A21
1236 Chair A21
What I want to do is return the percentage of id numbers that have only 1 brand for a category.
I have tried to get a data frame that shows the count of brands per each id number like this:
X1 X2 total.count
1234 Couch 1
1234 Chair 3
1235 Couch 1
1236 Chair 1
Using:
DF2 = (DF %>%
group_by(X2,X3) %>%
summarise(total.count=n()))
But it isn't working correctly. I would then want to filter out all of the id numbers that have counts more than 1. Any thoughts?
Upvotes: 1
Views: 350
Reputation: 2806
I think data.table is the easiest way to solve something like this.
dat = read.table("clipboard", header = TRUE)
library(data.table)
setDT(dat)
Then, to get the output you're looking for we could do this
> dat[ , .(total.count = uniqueN(X3)), by = .(X1, X2)]
X1 X2 total.count
1: 1234 Couch 1
2: 1234 Chair 3
3: 1235 Couch 1
4: 1236 Chair 1
To limit this to ID numbers with counts of 1 we could do this
dat2 = dat[ , .(total.count = uniqueN(X3)), by = .(X1, X2)]
> dat2[total.count == 1]
X1 X2 total.count
1: 1234 Couch 1
2: 1235 Couch 1
3: 1236 Chair 1
Or, if we wanted to do this without saving dat2, we could do this all in 1 line
> dat[ , .(total.count = uniqueN(X3)), by = .(X1, X2)][total.count == 1]
X1 X2 total.count
1: 1234 Couch 1
2: 1235 Couch 1
3: 1236 Chair 1
Upvotes: 2
Reputation: 8846
df <- read.table(text="
X1 X2 X3
1234 Couch A87
1234 Chair A18
1234 Chair A19
1234 Chair A21
1235 Couch A89
1235 Couch A89
1235 Couch A89
1236 Chair A21
1236 Chair A21
1236 Chair A21", header=TRUE)
agg <- aggregate(X3 ~ X1 + X2, df, function(x) length(x) > 1)
mean(agg$X3) * 100
# 75
just_one <- agg$X1[!agg$X3]
df[df$X1 == just_one, ]
# X1 X2 X3
# 1 1234 Couch A87
# 2 1234 Chair A18
# 3 1234 Chair A19
# 4 1234 Chair A21
Upvotes: 1