SteveMcManaman
SteveMcManaman

Reputation: 433

Extract all possible combinations of rows with unique values in a variable

I am trying to perform a meta-analysis on a dataset in which multiple authors have multiple studies which might cause bias. Therefore, I want to extract all the possible combinations of rows, in which any Author appears once.

Sample data:

sample <- data.frame(Author = c('a','a','b','b','c'),
                     Year = c('2020','2016', '2020','2010','2005'),
                     Value = c(3,1,2,4,5),
                     UniqueName = c('a 2020', 'a 2016', 'b 2020', 'b 2010', 'c 2005'))

Sample:

  Author Year Value UniqueName
1      a 2020     3     a 2020
2      a 2016     1     a 2016
3      b 2020     2     b 2020
4      b 2010     4     b 2010
5      c 2005     5     c 2005

And would like to extract all possible combinations of rows (in this case, 4 possibilities) where each Author appears once.

> output1
  Author Year Value UniqueName
1      a 2020     3     a 2020
2      b 2020     2     b 2020
3      c 2005     5     c 2005


> output2
  Author Year Value UniqueName
1      a 2016     1     a 2016
2      b 2020     2     b 2020
3      c 2005     5     c 2005


> output3
  Author Year Value UniqueName
1      a 2016     1     a 2016
2      b 2010     4     b 2010
3      c 2005     5     c 2005


> output4
  Author Year Value UniqueName
1      a 2020     3     a 2020
2      b 2010     4     b 2010
3      c 2005     5     c 2005

At the end, I will perform the analyses on these 4 different extracted dataframes, but I don't know how to get them in a less manual way.

Upvotes: 2

Views: 168

Answers (1)

Guillaume Mulier
Guillaume Mulier

Reputation: 503

Maybe a less hacky way exists, but I seem to have a working solution.

My idea was to split your dataframe on authors and brute force the combinations of unique rows with expand.grid. Then with lapply creating a list of data.frames with the indexes of rows.

Here is the code:

splitsample <- split(sample, sample$Author)
outputs_rows <- expand.grid(lapply(splitsample, \(x) seq_len(nrow(x))))
names_authors <- colnames(outputs_rows)
outputs <- lapply(seq_len(nrow(outputs_rows)),
                  function(row) {
                    df <- data.frame()
                    for (aut in names_authors) {
                      df <- rbind(df, splitsample[[aut]][outputs_rows[row, aut], ])
                    }
                    return(df)
                  })
outputs

And the result looks like this:

> outputs
[[1]]
  Author Year Value UniqueName
1      a 2020     3     a 2020
3      b 2020     2     b 2020
5      c 2005     5     c 2005

[[2]]
  Author Year Value UniqueName
2      a 2016     1     a 2016
3      b 2020     2     b 2020
5      c 2005     5     c 2005

[[3]]
  Author Year Value UniqueName
1      a 2020     3     a 2020
4      b 2010     4     b 2010
5      c 2005     5     c 2005

[[4]]
  Author Year Value UniqueName
2      a 2016     1     a 2016
4      b 2010     4     b 2010
5      c 2005     5     c 2005

I hope this helped you.

Upvotes: 3

Related Questions