Reputation: 823
I have data of how team members rated one another, from multiple teams. Each person has their own id number but also a team and rater number within the team like so:
StudyID TeamID CATMERater Rated Rating
(int) (int) (int) (dbl) (dbl)
1 2930 551 1 1 5.000000 #How rater 1 rated 1 (themselves)
2 2938 551 2 1 3.800000 #How rater 2 rated 1
3 2939 551 3 1 5.000000 #How rater 3 rated 1
4 2930 551 1 2 3.666667 #How rater 1 rated 2
5 2938 551 2 2 4.000000 #...
6 2939 551 3 2 3.866667
...
and so on. I got this format using tidyr
and am trying to get a new column of the StudyID where the TeamID and person being rated are the same. This is what I've tried, but doesn't work because I'm not sure how to reference the same table:
edges %>% mutate(RatedStudyID = filter(edges, TeamID == TeamID & Rated == CATMERater))
Hopefully this makes sense, but I'd appreciate a recommendation to get headed in the right direction. If it is something with left_join
how do I say where TeamID == TeamID
?
Here's what I'd like to see in the end (mostly the last column though):
StudyID TeamID CATMERater Rated Rating RatedStudyID
(int) (int) (int) (dbl) (dbl)
1 2930 551 1 1 5.000000 2930
2 2938 551 2 1 3.800000 2930
3 2939 551 3 1 5.000000 2930
4 2930 551 1 2 3.666667 2938
5 2938 551 2 2 4.000000 2938
6 2939 551 3 2 3.866667 2938
...
dput results per @akron that gives an error:
structure(list(StudyID = c(2930L, 2938L, 2939L, 2930L, 2938L,
2939L, 2930L, 2938L, 2939L, 2930L, 2938L, 2939L, 2930L, 2938L,
2939L, 2930L, 2938L, 2939L, 2920L, 2941L, 2989L, 2920L, 2941L,
2989L, 2920L, 2941L, 2989L, 2920L, 2941L, 2989L, 2920L, 2941L,
2989L, 2920L, 2941L, 2989L, 2922L, 2924L, 2943L, 2922L, 2924L,
2943L, 2922L, 2924L, 2943L, 2922L, 2924L, 2943L, 2922L, 2924L
), TeamID = c(551L, 551L, 551L, 551L, 551L, 551L, 551L, 551L,
551L, 551L, 551L, 551L, 551L, 551L, 551L, 551L, 551L, 551L, 552L,
552L, 552L, 552L, 552L, 552L, 552L, 552L, 552L, 552L, 552L, 552L,
552L, 552L, 552L, 552L, 552L, 552L, 553L, 553L, 553L, 553L, 553L,
553L, 553L, 553L, 553L, 553L, 553L, 553L, 553L, 553L), CATMERater = c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L,
2L, 1L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L), Rated = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6,
6, 6, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 1,
1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5), Rating = c(5, 3.8, 5,
3.66666666666667, 4, 3.86666666666667, 4.53333333333333, 4, 4.8,
NaN, NaN, NaN, NaN, NaN, NaN, NA, NA, NA, 3.93333333333333, 5,
5, 5, 5, 5, 5, 5, 5, NaN, NaN, NaN, NaN, NaN, NaN, NA, NA, NA,
4, 4, 4, 4, 4, 4, 4, 3.86666666666667, 4, NaN, NaN, NaN, NaN,
NaN)), .Names = c("StudyID", "TeamID", "CATMERater", "Rated",
"Rating"), class = c("tbl_df", "data.frame"), row.names = c(NA,
-50L))
Upvotes: 2
Views: 330
Reputation: 24955
From the comments:
library(dplyr)
x %>%
group_by(Rated, TeamID) %>% #group by each team/rated individual
filter(any(CATMERater == Rated)) %>% #filter out any groups with unrated individuals
mutate(new = StudyID[CATMERater == Rated]) #make the new column
The new column is created by subsetting each group - it is the same as x$StudyID[x$CATMERater == x$Rated]
would be on the whole dataframe. As long as we have one place where this is true (ie self rating) the value is set to every member of that group.
Upvotes: 2
Reputation: 18487
I think you can solve this with a join
edges %>%
select(TeamID, Rated = CATMERater, RaterStudyID = StudyID) %>%
inner_join(edges, by = c("TeamID", "Rated"))
Upvotes: 0
Reputation: 887951
With data.table
library(data.table)
setDT(edges)[ , RatedStudyID := StudyID[CATMERater == Rated] , .(Rated, TeamID)]
edges
# StudyID TeamID CATMERater Rated Rating RatedStudyID
#1: 2930 551 1 1 5.000000 2930
#2: 2938 551 2 1 3.800000 2930
#3: 2939 551 3 1 5.000000 2930
#4: 2930 551 1 2 3.666667 2938
#5: 2938 551 2 2 4.000000 2938
#6: 2939 551 3 2 3.866667 2938
In the new dataset, there are groups that don't have any similar values for CATMERater and Rated in the same row. So, we can use an exception to return NA for those.
setDT(df1)[, RatedStudyID :=if(!any(CATMERater==Rated)) NA_integer_
else StudyID[CATMERater ==Rated], .(Rated, TeamID)]
Upvotes: 1