Andrew Jackson
Andrew Jackson

Reputation: 823

dplyr lookup from same table

I have data of how team members rated one another, from multiple teams. Each person has their own id number but also a team and rater number within the team like so:

  StudyID TeamID CATMERater Rated   Rating
    (int)  (int)      (int) (dbl)    (dbl)
1    2930    551          1     1 5.000000 #How rater 1 rated 1 (themselves)
2    2938    551          2     1 3.800000 #How rater 2 rated 1
3    2939    551          3     1 5.000000 #How rater 3 rated 1
4    2930    551          1     2 3.666667 #How rater 1 rated 2
5    2938    551          2     2 4.000000 #...
6    2939    551          3     2 3.866667
...

and so on. I got this format using tidyr and am trying to get a new column of the StudyID where the TeamID and person being rated are the same. This is what I've tried, but doesn't work because I'm not sure how to reference the same table:

edges %>% mutate(RatedStudyID = filter(edges, TeamID == TeamID & Rated == CATMERater))

Hopefully this makes sense, but I'd appreciate a recommendation to get headed in the right direction. If it is something with left_join how do I say where TeamID == TeamID?

Here's what I'd like to see in the end (mostly the last column though):

  StudyID TeamID CATMERater Rated   Rating   RatedStudyID
    (int)  (int)      (int) (dbl)    (dbl)   
1    2930    551          1     1 5.000000   2930
2    2938    551          2     1 3.800000   2930
3    2939    551          3     1 5.000000   2930
4    2930    551          1     2 3.666667   2938
5    2938    551          2     2 4.000000   2938
6    2939    551          3     2 3.866667   2938
...

dput results per @akron that gives an error:

structure(list(StudyID = c(2930L, 2938L, 2939L, 2930L, 2938L, 
2939L, 2930L, 2938L, 2939L, 2930L, 2938L, 2939L, 2930L, 2938L, 
2939L, 2930L, 2938L, 2939L, 2920L, 2941L, 2989L, 2920L, 2941L, 
2989L, 2920L, 2941L, 2989L, 2920L, 2941L, 2989L, 2920L, 2941L, 
2989L, 2920L, 2941L, 2989L, 2922L, 2924L, 2943L, 2922L, 2924L, 
2943L, 2922L, 2924L, 2943L, 2922L, 2924L, 2943L, 2922L, 2924L
), TeamID = c(551L, 551L, 551L, 551L, 551L, 551L, 551L, 551L, 
551L, 551L, 551L, 551L, 551L, 551L, 551L, 551L, 551L, 551L, 552L, 
552L, 552L, 552L, 552L, 552L, 552L, 552L, 552L, 552L, 552L, 552L, 
552L, 552L, 552L, 552L, 552L, 552L, 553L, 553L, 553L, 553L, 553L, 
553L, 553L, 553L, 553L, 553L, 553L, 553L, 553L, 553L), CATMERater = c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 
2L, 1L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L), Rated = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 
6, 6, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 1, 
1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5), Rating = c(5, 3.8, 5, 
3.66666666666667, 4, 3.86666666666667, 4.53333333333333, 4, 4.8, 
NaN, NaN, NaN, NaN, NaN, NaN, NA, NA, NA, 3.93333333333333, 5, 
5, 5, 5, 5, 5, 5, 5, NaN, NaN, NaN, NaN, NaN, NaN, NA, NA, NA, 
4, 4, 4, 4, 4, 4, 4, 3.86666666666667, 4, NaN, NaN, NaN, NaN, 
NaN)), .Names = c("StudyID", "TeamID", "CATMERater", "Rated", 
"Rating"), class = c("tbl_df", "data.frame"), row.names = c(NA, 
-50L))

Upvotes: 2

Views: 330

Answers (3)

jeremycg
jeremycg

Reputation: 24955

From the comments:

library(dplyr)
x %>%
   group_by(Rated, TeamID) %>% #group by each team/rated individual
   filter(any(CATMERater == Rated)) %>% #filter out any groups with unrated individuals
   mutate(new = StudyID[CATMERater == Rated]) #make the new column

The new column is created by subsetting each group - it is the same as x$StudyID[x$CATMERater == x$Rated] would be on the whole dataframe. As long as we have one place where this is true (ie self rating) the value is set to every member of that group.

Upvotes: 2

Thierry
Thierry

Reputation: 18487

I think you can solve this with a join

edges %>%
  select(TeamID, Rated = CATMERater, RaterStudyID = StudyID) %>%
  inner_join(edges, by = c("TeamID", "Rated"))

Upvotes: 0

akrun
akrun

Reputation: 887951

With data.table

library(data.table)
setDT(edges)[ , RatedStudyID := StudyID[CATMERater == Rated] , .(Rated, TeamID)]
edges
#   StudyID TeamID CATMERater Rated   Rating RatedStudyID
#1:    2930    551          1     1 5.000000         2930
#2:    2938    551          2     1 3.800000         2930
#3:    2939    551          3     1 5.000000         2930
#4:    2930    551          1     2 3.666667         2938
#5:    2938    551          2     2 4.000000         2938
#6:    2939    551          3     2 3.866667         2938

In the new dataset, there are groups that don't have any similar values for CATMERater and Rated in the same row. So, we can use an exception to return NA for those.

setDT(df1)[, RatedStudyID :=if(!any(CATMERater==Rated)) NA_integer_
             else StudyID[CATMERater ==Rated], .(Rated, TeamID)]

Upvotes: 1

Related Questions