Reputation: 7127
I am trying to split my data into lists based on the combinations of some column and apply a normalisation function based on the min and max of the data from the different combinations.
If I am using the iris
data set then I would like to have the following combinations.
setosa & versicolor versicolor & virginica setosa & virginica
Where I will apply the following function:
normaliseData <-function(m, min_m, max_m){
(m - min(m)) / (max(m) - min(m))
}
So the first data frame will use the values from the min and max between setosa & versicolor. I am only interested in applying the normalisation to a single column in the data frame i.e.
iris %>%
select(Petal.Width, Species)
I can get the min and max for one combination using:
iris %>%
filter(Species == "setosa" | Species == "versicolor") %>%
summarise(
min_m = min(Petal.Width),
max_m = max(Petal.Width)
)
My idea is to first create a list for each of the combinations of the data. Then map
the normalisation function over each of the lists. I have looked at the combn
function but cannot get it to split the data.
EDIT: I would ignore this part below Running:
expand.grid(colnames(iris[,1:4]), colnames(iris[,1:4])) %>%
filter(!Var1 == Var2)
Gets me:
Var1 Var2
1 Sepal.Width Sepal.Length
2 Petal.Length Sepal.Length
3 Petal.Width Sepal.Length
4 Sepal.Length Sepal.Width
5 Petal.Length Sepal.Width
6 Petal.Width Sepal.Width
7 Sepal.Length Petal.Length
8 Sepal.Width Petal.Length
9 Petal.Width Petal.Length
10 Sepal.Length Petal.Width
11 Sepal.Width Petal.Width
12 Petal.Length Petal.Width
But this does not give me unique combinations as such, since "Sepal.Width & Sepal.Length" is the same as "Sepal.Length & Sepal.Width".
Upvotes: 2
Views: 175
Reputation: 887691
We can use combn
on the levels
of 'Species' and do the filter
library(dplyr)
library(purrr)
out_lst <- combn(levels(iris$Species), 2, FUN = function(x)
iris %>%
filter(Species %in% x) %>%
summarise(across(where(is.numeric), normaliseData)), simplify = FALSE)
names(out_lst) <- combn(levels(iris$Species), 2, str_c, collapse="_")
If this needs to be carried out for pairwise combination of numeric
columns as well, create a 3 column expanded dataset with levels
of 'Species' and the other column names ('Var1', 'Var2')
expand_dat <- crossing(Species = levels(iris$Species),
Var1 = names(iris)[1:4],
Var2 = names(iris)[1:4]) %>%
filter(Var1 != Var2)
Then, use pmap
to loop over each row of the dataset, filter
the rows of the 'iris' where the 'Species' is %in%
the first element (..1
- Species column element from expand_dat), then apply the normaliseData
function across
the columns from 'Var1' and 'Var2'
out_lst2 <- expand_dat %>%
pmap(~ iris %>%
filter(Species %in% ..1) %>%
summarise(across(all_of(c(..2, ..3)), normaliseData)))
names(out_lst2) <- expand_dat$Species
Upvotes: 3