amisos55
amisos55

Reputation: 1979

calculating the mean by another variable's categories in r

I have an example of a dataset like this:

id <-       c(1,1,1, 2,2,2, 3,3, 4,4, 5,5,5,5, 6,6,6, 7, 8,8)
item.id <-  c(1,1,2, 1,1,1 ,1,1, 1,2, 1,2,2,2, 1,1,1, 1, 1,2)
sequence <- c(1,2,1, 1,2,3, 1,2, 1,1, 1,1,2,3, 1,2,3, 1, 1,1)
score <-    c(0,0,0, 0,0,1, 1,0, 1,1, 1,0,1,1, 0,0,0, 1, 0,1)
category <- c(2,2,2, 3,3,3, 1,1, 3,3, 1,1,1,1, 4,4,4, 2, 3,3)

data <- data.frame("id"=id, "item.id"=item.id, "sequence"=sequence, "score"=score, "category"=category)
> data
   id item.id sequence score category
1   1       1        1     0        2
2   1       1        2     0        2
3   1       2        1     0        2
4   2       1        1     0        3
5   2       1        2     0        3
6   2       1        3     1        3
7   3       1        1     1        1
8   3       1        2     0        1
9   4       1        1     1        3
10  4       2        1     1        3
11  5       1        1     1        1
12  5       2        1     0        1
13  5       2        2     1        1
14  5       2        3     1        1
15  6       1        1     0        4
16  6       1        2     0        4
17  6       1        3     0        4
18  7       1        1     1        2
19  8       1        1     0        3
20  8       2        1     1        3

id represents persons, item.id is for questions. sequence is for the attempt to change the response, and the score is the score of the item, category is the category each student falls in.

What I want to do is to grab the maximum sequence number for each id per item.id, then calculate the mean score of the maximum sequence value for each category. I was able to complete the first step but could not figure out how to take the cross tab of the mean of the maximum sequence number per category.

library(dplyr)
    data %>%
      group_by(id,item.id) %>%
      summarize(max.seq = max(sequence))
    # A tibble: 12 x 3
    # Groups:   id [?]
          id item.id max.seq
       <dbl>   <dbl>   <dbl>
     1     1       1       2
     2     1       2       1
     3     2       1       3
     4     3       1       2
     5     4       1       1
     6     4       2       1
     7     5       1       1
     8     5       2       3
     9     6       1       3
    10     7       1       1
    11     8       1       1
    12     8       2       1

The result of the second step should be:

category           1     2    3    4 
mean(max(seq))     2    1.33 1.4   3

Any suggestions?

Thanks in advance!

Upvotes: 0

Views: 868

Answers (1)

Dave2e
Dave2e

Reputation: 24069

You need to get the category value into the summary table. Since the category value is constant for each id, item.id combination using the mean in the summary function is one way.

library(dplyr)
data %>%
  group_by(id,item.id) %>%
  summarize(cat=mean(category), max.seq = max(sequence)) %>% 
  group_by(cat) %>% summarize(mean(max.seq))

# A tibble: 4 x 2
    cat `mean(max.seq)`
  <dbl>           <dbl>
1     1            2   
2     2            1.33
3     3            1.4 
4     4            3   

My calculations are slightly different from yours, please double check my method before accepting.

Upvotes: 1

Related Questions