Reputation: 1979
I have a dataset looks like below in R
: Found similar posts like this Counting number of times a value occurs but not exactly the same.
id <- c(1,1,1, 2,2,2, 3,3,3,3)
cat.1 <- c("a","a","a","b","b","b","c","c","c","c")
cat.2 <- c("m","m","m","f","f","f","m","m","m","m")
score <- c(-1,0,-1, 1,0,1, -1,0,1,1)
data <- data.frame("id"=id, "cat.1"=cat.1, "cat.2"=cat.2, "score"=score)
data
id cat.1 cat.2 score
1 1 a m -1
2 1 a m 0
3 1 a m -1
4 2 b f 1
5 2 b f 0
6 2 b f 1
7 3 c m -1
8 3 c m 0
9 3 c m 1
10 3 c m 1
I would like to count number of -1
values in the score
variable within each id. Also, I would like to keep the cat.1
and cat.2
variables. Desired output would be:
id cat.1 cat.2 count(-1)
1 1 a m 2
2 2 b f 0
3 3 c m 1
Do you have any suggestions? Thanks!
Upvotes: 3
Views: 1638
Reputation: 886938
Another option is count
library(dplyr)
data %>%
mutate(score = score == -1) %>%
dplyr::count(id, cat.1, cat.2, wt = score)
# A tibble: 3 x 4
# id cat.1 cat.2 n
# <dbl> <fct> <fct> <int>
#1 1 a m 2
#2 2 b f 0
#3 3 c m 1
Upvotes: 1
Reputation: 10761
This is something we can use dplyr
for:
data %>%
group_by(id, cat.1, cat.2) %>% # or: group_by_at(vars(-score))
summarise(count_neg_1 = sum(score == -1))
# id cat.1 cat.2 count_neg_1
# 1 1 a m 2
# 2 2 b f 0
# 3 3 c m 1
You can change the name of the calculated column if you so desire. I generally avoid anything other than a letter, number, or underscore in my variable names.
Upvotes: 6
Reputation: 12703
library(data.table)
setDT(data)[ , sum(score == -1), by=c('id', 'cat.1', 'cat.2')]
# id cat.1 cat.2 V1
# 1: 1 a m 2
# 2: 2 b f 0
# 3: 3 c m 1
Upvotes: 5
Reputation: 39858
One base R
possibility could be:
aggregate(score ~ ., FUN = function(x) sum(x == -1), data = data)
id cat.1 cat.2 score
1 2 b f 0
2 1 a m 2
3 3 c m 1
If you have more variables in your data and you want to group with just these three, then you can explicitly specify it by aggregate(score ~ id + cat.1 + cat.2, ...)
Upvotes: 4