Addison
Addison

Reputation: 143

Sorting columns based on individual row values in R

First, I apologize for the somewhat non-descriptive title but as you can probably see below, the problem is not easy to sum up in a single question.

I have a data set of peer evaluations, where group members rated each other on a set of criteria. Unfortunately, I wasn't able to pre-populate respondents' group members in the the survey so I ended up with a very jumbled data set that looks like this:

data = data.frame(gm1 = c("Lin", "Bill", "Cass", "Dan"), 
                  gm2 = c("Bill", "Lin", "Bill", "Lin"),
                  gm3 = c("Cass", "Dan", "Lin", "Cass"),
                  gm4 = c("Dan", "Cass", "Dan", "Bill"),
                  crit1_1 = c(5, 2, 4, 4), 
                  crit1_2 = c(5, 3, 3, 3), 
                  crit1_3 = c(4, 2, 4, 4), 
                  crit1_4 = c(5, 5, 4, 4))

where gm1 is the respondent and crit1 represents their self-ratings and gm2-4 are their group members with crit1_2-4 representing their ratings of each member on the first criteria (there are multiple). I need to calculate the average of the group's ratings for each member but as you can see, there is no consistent pattern for who is gm1-4. For example, for Lin, I need an average of Bill, Cass, and Dan's ratings of Lin.

I know this is a bit of a mess but I hoping there is a relatively simple way of transforming this data so that I can perform the calculations that I need to.

I've thought about assigning each group member a number like so

key = data.frame(gm1 = c("Lin", "Bill", "Cass", "Dan"), 
                 num = c(1, 2, 3, 4))
data1 = data[ , c(1,2,3,4)]
data1[] <- key$num[match(unlist(data), key$gm1)]
data2 = cbind(data1, data[ , -c(1,2,3,4)])

and then sorting columns gm1-4 so that the pattern is always the same for each criteria. However, I do not know how to sort gm1-4 for each row in such a way that creates corresponding changes in crit1_1-4.

Any suggestions would be much appreciated!

Upvotes: 1

Views: 85

Answers (2)

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

base

data = data.frame(gm1 = c("Lin", "Bill", "Cass", "Dan"), 
                  gm2 = c("Bill", "Lin", "Bill", "Lin"),
                  gm3 = c("Cass", "Dan", "Lin", "Cass"),
                  gm4 = c("Dan", "Cass", "Dan", "Bill"),
                  crit1_1 = c(5, 2, 4, 4), 
                  crit1_2 = c(5, 3, 3, 3), 
                  crit1_3 = c(4, 2, 4, 4), 
                  crit1_4 = c(5, 5, 4, 4))



cols_gm <- sapply(data, is.character)
cols_crit <- sapply(data, is.numeric)


crit <- data[cols_crit]
gm <- data[cols_gm]
nm <- unique(unlist(gm))


sapply(nm, function(x) {
  coord <- matrix(c(seq(nrow(data)), max.col(gm == x)), nrow = nrow(data))
  coord <- coord[!(coord[, 2] == 1), ]
  mean(crit[coord])
})
#>      Lin     Bill     Cass      Dan 
#> 3.333333 4.000000 4.333333 3.666667

Created on 2021-04-19 by the reprex package (v2.0.0)

Upvotes: 1

Ben
Ben

Reputation: 30549

Perhaps this might be more manageable in long form. With pivot_longer from tidyr, you can extend your data for each group member and criterion. The filter and select will remove the "self-rating" data.

library(tidyverse)

data %>%
  pivot_longer(cols = starts_with("crit"), 
               values_to = "rating", 
               names_to = c("criteria", "rating_number"), 
               names_pattern = "crit(\\d+)_(\\d+)") %>%
  filter(rating_number != 1) %>%
  select(-gm1) %>%
  pivot_longer(cols = starts_with("gm"), 
               names_pattern = "gm(\\d+)", 
               names_to = "member_number", 
               values_to = "member_name") %>%
  filter(member_number == rating_number) %>%
  group_by(member_name, criteria) %>%
  summarise(mean_rating = mean(rating))

Output

  member_name criteria mean_rating
  <chr>       <chr>          <dbl>
1 Bill        1               4   
2 Cass        1               4.33
3 Dan         1               3.67
4 Lin         1               3.33

Upvotes: 1

Related Questions