amisos55
amisos55

Reputation: 1979

Counting number of times a value occurs grouping by id in R

I have a dataset looks like below in R: Found similar posts like this Counting number of times a value occurs but not exactly the same.

id <-     c(1,1,1, 2,2,2, 3,3,3,3)
cat.1 <-  c("a","a","a","b","b","b","c","c","c","c")
cat.2 <-  c("m","m","m","f","f","f","m","m","m","m")
score <-    c(-1,0,-1, 1,0,1, -1,0,1,1)


data <- data.frame("id"=id, "cat.1"=cat.1, "cat.2"=cat.2, "score"=score)
data
   id cat.1 cat.2 score
1   1     a     m    -1
2   1     a     m     0
3   1     a     m    -1
4   2     b     f     1
5   2     b     f     0
6   2     b     f     1
7   3     c     m    -1
8   3     c     m     0
9   3     c     m     1
10  3     c     m     1

I would like to count number of -1 values in the score variable within each id. Also, I would like to keep the cat.1 and cat.2 variables. Desired output would be:

   id cat.1 cat.2 count(-1)
1   1     a     m    2
2   2     b     f    0
3   3     c     m    1

Do you have any suggestions? Thanks!

Upvotes: 3

Views: 1638

Answers (4)

akrun
akrun

Reputation: 886938

Another option is count

library(dplyr)
data %>%
   mutate(score = score == -1) %>% 
   dplyr::count(id, cat.1, cat.2, wt = score)
# A tibble: 3 x 4
#    id cat.1 cat.2     n
#   <dbl> <fct> <fct> <int>
#1     1 a     m         2
#2     2 b     f         0
#3     3 c     m         1

Upvotes: 1

bouncyball
bouncyball

Reputation: 10761

This is something we can use dplyr for:

data %>%
    group_by(id, cat.1, cat.2) %>% # or: group_by_at(vars(-score))
    summarise(count_neg_1 = sum(score == -1))


#      id cat.1 cat.2 count_neg_1
# 1     1 a     m               2
# 2     2 b     f               0
# 3     3 c     m               1

You can change the name of the calculated column if you so desire. I generally avoid anything other than a letter, number, or underscore in my variable names.

Upvotes: 6

Sathish
Sathish

Reputation: 12703

library(data.table)
setDT(data)[ , sum(score == -1), by=c('id', 'cat.1', 'cat.2')]
#    id cat.1 cat.2 V1
# 1:  1     a     m  2
# 2:  2     b     f  0
# 3:  3     c     m  1

Upvotes: 5

tmfmnk
tmfmnk

Reputation: 39858

One base R possibility could be:

aggregate(score ~ ., FUN = function(x) sum(x == -1), data = data)

  id cat.1 cat.2 score
1  2     b     f     0
2  1     a     m     2
3  3     c     m     1

If you have more variables in your data and you want to group with just these three, then you can explicitly specify it by aggregate(score ~ id + cat.1 + cat.2, ...)

Upvotes: 4

Related Questions