Reputation: 103
My dataset has repeated observations for people that work on projects. I need a data frame with two columns that list 'combinations' of projects for each person and time point. Let me explain with an example:
This is my data:
ID Week Project
01 1 101
01 1 102
01 1 103
01 2 101
01 2 102
02 1 101
02 1 102
02 2 101
Person 1 (ID = 1) worked on three projects in week 1. This means that there are six possible combinations of projects (project_i & project_j) for this person, in this week.
This is what I need
ID Week Project_i Project_j
01 1 101 101
01 1 101 102
01 1 101 103
01 1 102 101
01 1 102 102
01 1 102 103
01 1 103 101
01 1 103 102
01 1 103 103
01 2 101 101
01 2 101 102
01 2 102 101
01 2 102 102
02 1 101 101
02 1 101 102
02 1 102 101
02 1 102 102
02 2 101 101
Losing cases that only have one project per week is not an issue.
I have tried basic r and reshape2 for a bit, but I can't figure this out.
Upvotes: 3
Views: 102
Reputation: 43354
Here's a base option using expand.grid
:
do.call(rbind, lapply(split(df, paste(df$ID, df$Week)), function(x){
x2 <- expand.grid(ID = unique(x$ID),
Week = unique(x$Week),
Project_i = unique(x$Project),
Project_j = unique(x$Project))
# omit if 101 102 is different from 102 101; make `<` if 101 101 not possible
x2[x2$Project_i <= x2$Project_j,]
}))
# ID Week Project_i Project_j
# 1 1.1 1 1 101 101
# 1 1.4 1 1 101 102
# 1 1.5 1 1 102 102
# 1 1.7 1 1 101 103
# 1 1.8 1 1 102 103
# 1 1.9 1 1 103 103
# 1 2.1 1 2 101 101
# 1 2.3 1 2 101 102
# 1 2.4 1 2 102 102
# 2 1.1 2 1 101 101
# 2 1.3 2 1 101 102
# 2 1.4 2 1 102 102
# 2 2 2 2 101 101
Upvotes: 5
Reputation: 66819
Here's one way:
library(data.table)
setDT(DT)
DT[, CJ(P1 = Project, P2 = Project)[P1 != P2], by=.(ID, Week)]
ID Week P1 P2
1: 1 1 101 102
2: 1 1 101 103
3: 1 1 102 101
4: 1 1 102 103
5: 1 1 103 101
6: 1 1 103 102
7: 1 2 101 102
8: 1 2 102 101
9: 2 1 101 102
10: 2 1 102 101
CJ
is the Cartesian Join of two vectors, taking all combinations.
If you don't want both (101,102) and (102,101), use P1 > P2
instead of P1 != P2
. Oh, the OP has changed the question... so use P1 <= P2
.
Upvotes: 6
Reputation: 9133
Here is a solution that uses dplyr
and tidyr
. The key step is tidyr::complete()
combined with dplyr::group_by()
library(dplyr)
library(tidyr)
d %>%
rename(Project_i = Project) %>%
mutate(Project_j = Project_i) %>%
group_by(ID, Week) %>%
complete(Project_i, Project_j) %>%
filter(Project_i != Project_j)
Upvotes: 6