Reputation: 1979
I have an example of a dataset like this:
id <- c(1,1,1, 2,2,2, 3,3, 4,4, 5,5,5,5, 6,6,6, 7, 8,8)
item.id <- c(1,1,2, 1,1,1 ,1,1, 1,2, 1,2,2,2, 1,1,1, 1, 1,2)
sequence <- c(1,2,1, 1,2,3, 1,2, 1,1, 1,1,2,3, 1,2,3, 1, 1,1)
score <- c(0,0,0, 0,0,1, 1,0, 1,1, 1,0,1,1, 0,0,0, 1, 0,1)
category <- c(2,2,2, 3,3,3, 1,1, 3,3, 1,1,1,1, 4,4,4, 2, 3,3)
data <- data.frame("id"=id, "item.id"=item.id, "sequence"=sequence, "score"=score, "category"=category)
> data
id item.id sequence score category
1 1 1 1 0 2
2 1 1 2 0 2
3 1 2 1 0 2
4 2 1 1 0 3
5 2 1 2 0 3
6 2 1 3 1 3
7 3 1 1 1 1
8 3 1 2 0 1
9 4 1 1 1 3
10 4 2 1 1 3
11 5 1 1 1 1
12 5 2 1 0 1
13 5 2 2 1 1
14 5 2 3 1 1
15 6 1 1 0 4
16 6 1 2 0 4
17 6 1 3 0 4
18 7 1 1 1 2
19 8 1 1 0 3
20 8 2 1 1 3
id
represents persons, item.id
is for questions. sequence
is for the attempt to change the response, and the score
is the score of the item, category
is the category each student falls in.
What I want to do is to grab the maximum sequence number for each id
per item.id
, then calculate the mean score of the maximum sequence value for each category
. I was able to complete the first step but could not figure out how to take the cross tab of the mean of the maximum sequence number per category
.
library(dplyr)
data %>%
group_by(id,item.id) %>%
summarize(max.seq = max(sequence))
# A tibble: 12 x 3
# Groups: id [?]
id item.id max.seq
<dbl> <dbl> <dbl>
1 1 1 2
2 1 2 1
3 2 1 3
4 3 1 2
5 4 1 1
6 4 2 1
7 5 1 1
8 5 2 3
9 6 1 3
10 7 1 1
11 8 1 1
12 8 2 1
The result of the second step should be:
category 1 2 3 4
mean(max(seq)) 2 1.33 1.4 3
Any suggestions?
Thanks in advance!
Upvotes: 0
Views: 868
Reputation: 24069
You need to get the category value into the summary table. Since the category value is constant for each id, item.id combination using the mean in the summary function is one way.
library(dplyr)
data %>%
group_by(id,item.id) %>%
summarize(cat=mean(category), max.seq = max(sequence)) %>%
group_by(cat) %>% summarize(mean(max.seq))
# A tibble: 4 x 2
cat `mean(max.seq)`
<dbl> <dbl>
1 1 2
2 2 1.33
3 3 1.4
4 4 3
My calculations are slightly different from yours, please double check my method before accepting.
Upvotes: 1