Reputation: 143
First, I apologize for the somewhat non-descriptive title but as you can probably see below, the problem is not easy to sum up in a single question.
I have a data set of peer evaluations, where group members rated each other on a set of criteria. Unfortunately, I wasn't able to pre-populate respondents' group members in the the survey so I ended up with a very jumbled data set that looks like this:
data = data.frame(gm1 = c("Lin", "Bill", "Cass", "Dan"),
gm2 = c("Bill", "Lin", "Bill", "Lin"),
gm3 = c("Cass", "Dan", "Lin", "Cass"),
gm4 = c("Dan", "Cass", "Dan", "Bill"),
crit1_1 = c(5, 2, 4, 4),
crit1_2 = c(5, 3, 3, 3),
crit1_3 = c(4, 2, 4, 4),
crit1_4 = c(5, 5, 4, 4))
where gm1 is the respondent and crit1 represents their self-ratings and gm2-4 are their group members with crit1_2-4 representing their ratings of each member on the first criteria (there are multiple). I need to calculate the average of the group's ratings for each member but as you can see, there is no consistent pattern for who is gm1-4. For example, for Lin, I need an average of Bill, Cass, and Dan's ratings of Lin.
I know this is a bit of a mess but I hoping there is a relatively simple way of transforming this data so that I can perform the calculations that I need to.
I've thought about assigning each group member a number like so
key = data.frame(gm1 = c("Lin", "Bill", "Cass", "Dan"),
num = c(1, 2, 3, 4))
data1 = data[ , c(1,2,3,4)]
data1[] <- key$num[match(unlist(data), key$gm1)]
data2 = cbind(data1, data[ , -c(1,2,3,4)])
and then sorting columns gm1-4 so that the pattern is always the same for each criteria. However, I do not know how to sort gm1-4 for each row in such a way that creates corresponding changes in crit1_1-4.
Any suggestions would be much appreciated!
Upvotes: 1
Views: 85
Reputation: 8880
base
data = data.frame(gm1 = c("Lin", "Bill", "Cass", "Dan"),
gm2 = c("Bill", "Lin", "Bill", "Lin"),
gm3 = c("Cass", "Dan", "Lin", "Cass"),
gm4 = c("Dan", "Cass", "Dan", "Bill"),
crit1_1 = c(5, 2, 4, 4),
crit1_2 = c(5, 3, 3, 3),
crit1_3 = c(4, 2, 4, 4),
crit1_4 = c(5, 5, 4, 4))
cols_gm <- sapply(data, is.character)
cols_crit <- sapply(data, is.numeric)
crit <- data[cols_crit]
gm <- data[cols_gm]
nm <- unique(unlist(gm))
sapply(nm, function(x) {
coord <- matrix(c(seq(nrow(data)), max.col(gm == x)), nrow = nrow(data))
coord <- coord[!(coord[, 2] == 1), ]
mean(crit[coord])
})
#> Lin Bill Cass Dan
#> 3.333333 4.000000 4.333333 3.666667
Created on 2021-04-19 by the reprex package (v2.0.0)
Upvotes: 1
Reputation: 30549
Perhaps this might be more manageable in long form. With pivot_longer
from tidyr
, you can extend your data for each group member and criterion. The filter
and select
will remove the "self-rating" data.
library(tidyverse)
data %>%
pivot_longer(cols = starts_with("crit"),
values_to = "rating",
names_to = c("criteria", "rating_number"),
names_pattern = "crit(\\d+)_(\\d+)") %>%
filter(rating_number != 1) %>%
select(-gm1) %>%
pivot_longer(cols = starts_with("gm"),
names_pattern = "gm(\\d+)",
names_to = "member_number",
values_to = "member_name") %>%
filter(member_number == rating_number) %>%
group_by(member_name, criteria) %>%
summarise(mean_rating = mean(rating))
Output
member_name criteria mean_rating
<chr> <chr> <dbl>
1 Bill 1 4
2 Cass 1 4.33
3 Dan 1 3.67
4 Lin 1 3.33
Upvotes: 1