Reputation: 433
I am trying to perform a meta-analysis on a dataset in which multiple authors have multiple studies which might cause bias. Therefore, I want to extract all the possible combinations of rows, in which any Author appears once.
Sample data:
sample <- data.frame(Author = c('a','a','b','b','c'),
Year = c('2020','2016', '2020','2010','2005'),
Value = c(3,1,2,4,5),
UniqueName = c('a 2020', 'a 2016', 'b 2020', 'b 2010', 'c 2005'))
Sample:
Author Year Value UniqueName
1 a 2020 3 a 2020
2 a 2016 1 a 2016
3 b 2020 2 b 2020
4 b 2010 4 b 2010
5 c 2005 5 c 2005
And would like to extract all possible combinations of rows (in this case, 4 possibilities) where each Author appears once.
> output1
Author Year Value UniqueName
1 a 2020 3 a 2020
2 b 2020 2 b 2020
3 c 2005 5 c 2005
> output2
Author Year Value UniqueName
1 a 2016 1 a 2016
2 b 2020 2 b 2020
3 c 2005 5 c 2005
> output3
Author Year Value UniqueName
1 a 2016 1 a 2016
2 b 2010 4 b 2010
3 c 2005 5 c 2005
> output4
Author Year Value UniqueName
1 a 2020 3 a 2020
2 b 2010 4 b 2010
3 c 2005 5 c 2005
At the end, I will perform the analyses on these 4 different extracted dataframes, but I don't know how to get them in a less manual way.
Upvotes: 2
Views: 168
Reputation: 503
Maybe a less hacky way exists, but I seem to have a working solution.
My idea was to split your dataframe on authors and brute force the combinations of unique rows with expand.grid. Then with lapply creating a list of data.frames with the indexes of rows.
Here is the code:
splitsample <- split(sample, sample$Author)
outputs_rows <- expand.grid(lapply(splitsample, \(x) seq_len(nrow(x))))
names_authors <- colnames(outputs_rows)
outputs <- lapply(seq_len(nrow(outputs_rows)),
function(row) {
df <- data.frame()
for (aut in names_authors) {
df <- rbind(df, splitsample[[aut]][outputs_rows[row, aut], ])
}
return(df)
})
outputs
And the result looks like this:
> outputs
[[1]]
Author Year Value UniqueName
1 a 2020 3 a 2020
3 b 2020 2 b 2020
5 c 2005 5 c 2005
[[2]]
Author Year Value UniqueName
2 a 2016 1 a 2016
3 b 2020 2 b 2020
5 c 2005 5 c 2005
[[3]]
Author Year Value UniqueName
1 a 2020 3 a 2020
4 b 2010 4 b 2010
5 c 2005 5 c 2005
[[4]]
Author Year Value UniqueName
2 a 2016 1 a 2016
4 b 2010 4 b 2010
5 c 2005 5 c 2005
I hope this helped you.
Upvotes: 3