Split data into lists based on unique combinations of of the variables

Question

I am trying to split my data into lists based on the combinations of some column and apply a normalisation function based on the min and max of the data from the different combinations.

If I am using the iris data set then I would like to have the following combinations.

setosa & versicolor 
versicolor & virginica
setosa & virginica

Where I will apply the following function:

normaliseData <-function(m, min_m, max_m){
  (m - min(m)) / (max(m) - min(m))
}

So the first data frame will use the values from the min and max between setosa & versicolor. I am only interested in applying the normalisation to a single column in the data frame i.e.

iris %>% 
  select(Petal.Width, Species)

I can get the min and max for one combination using:

iris %>% 
  filter(Species == "setosa" | Species == "versicolor") %>% 
  summarise(
    min_m = min(Petal.Width),
    max_m = max(Petal.Width)
  )

My idea is to first create a list for each of the combinations of the data. Then map the normalisation function over each of the lists. I have looked at the combn function but cannot get it to split the data.

EDIT: I would ignore this part below Running:

expand.grid(colnames(iris[,1:4]), colnames(iris[,1:4])) %>% 
  filter(!Var1 == Var2)

Gets me:

           Var1         Var2
1   Sepal.Width Sepal.Length
2  Petal.Length Sepal.Length
3   Petal.Width Sepal.Length
4  Sepal.Length  Sepal.Width
5  Petal.Length  Sepal.Width
6   Petal.Width  Sepal.Width
7  Sepal.Length Petal.Length
8   Sepal.Width Petal.Length
9   Petal.Width Petal.Length
10 Sepal.Length  Petal.Width
11  Sepal.Width  Petal.Width
12 Petal.Length  Petal.Width

But this does not give me unique combinations as such, since "Sepal.Width & Sepal.Length" is the same as "Sepal.Length & Sepal.Width".

akrun · Accepted Answer

We can use combn on the levels of 'Species' and do the filter

library(dplyr)
library(purrr)
out_lst <- combn(levels(iris$Species), 2, FUN = function(x)
        iris %>% 
           filter(Species %in% x) %>% 
           summarise(across(where(is.numeric), normaliseData)), simplify = FALSE)
names(out_lst) <- combn(levels(iris$Species), 2, str_c, collapse="_")

If this needs to be carried out for pairwise combination of numeric columns as well, create a 3 column expanded dataset with levels of 'Species' and the other column names ('Var1', 'Var2')

expand_dat <- crossing(Species = levels(iris$Species), 
       Var1 = names(iris)[1:4],
        Var2 = names(iris)[1:4]) %>% 
   filter(Var1 != Var2)

Then, use pmap to loop over each row of the dataset, filter the rows of the 'iris' where the 'Species' is %in% the first element (..1 - Species column element from expand_dat), then apply the normaliseData function across the columns from 'Var1' and 'Var2'

out_lst2 <- expand_dat %>%
      pmap(~ iris %>% 
      filter(Species %in% ..1) %>%
      summarise(across(all_of(c(..2, ..3)), normaliseData)))
names(out_lst2) <- expand_dat$Species

Split data into lists based on unique combinations of of the variables

Answers (1)

Related Questions