fabha
fabha

Reputation: 111

mapply / expand.grid () for argument's combination with a condition

My question builds on another one previously posted by someone: mapply for all arguments' combinations [R]

I want to apply a function to multiple arguments using mapply, and this works with my code below. But I want to add a condition such that NOT ALL tmin- and tmax- values will be combined, instead only the first tmin with the first tmax, the second tmin with the second tmax (if tmin == 0.01 & tmax == 0.99 or if tmin == 0.05 & tmax == 0.95, but e.g. tmin == 0.01 should not be combined with tmax == 0.95). But the first elements of tmin & tmax should be combined with ALL variables, all second elements of tmin & tmax should be combined with ALL variables, etc (as below in the expand.grid() function).

In the end I should have a data frame as the one called "alltogether", but I should have 15 rows with the described condition and not 75 as it is the case now.

I could just filter rows with dplyr::filter afterwards, but is there a nice way to include this condition in the function?

Here an example data frame:

 dataframe <- data.frame(personID = 1:10, 
                  Var1 = c(4, 6, 3, 3, 7, 1, 20, NA, 12, 2),
                  Var2 = c(5, 4, 5, 6, 9, 14, 14, 1, 0, NA),
                  Var3 = c(NA, 15, 12, 0, NA, NA, 2, 7, 6, 7),
                  Var4 = c(0, 0, 0, 0, 1, 0, 1, 4, 2, 1), 
                  Var5 = c(12, 15, 11, 10, 10, 15, NA, 10, 13, 11))

and here the code I have so far:

des <- function(var, tmin, tmax){
  v <- var[var >= quantile(var, probs = tmin, na.rm = TRUE) &
             var <= quantile(var, probs = tmax, na.rm = TRUE)]
  d <- psych::describe(v)
  df <- cbind(variable = deparse(substitute(var)), tmin = tmin, tmax = tmax, d)
  print(df)
}
args = expand.grid(var = dataframe[, c("Var2", "Var4", "Var5")], tmin = c(0.01, 0.05, 0.1, 0.2, 0.25), tmax = c(0.99, 0.95, 0.9, 0.8, 0.75))

alltogether <- do.call("rbind", mapply(FUN = des, var = args$var, tmin = args$tmin, tmax = args$tmax,  SIMPLIFY = FALSE))

Thank you for helping!

Edit:

The expected output is the one after filtering the "alltogether"-dataframe with the following code (15 obs. of 16 variables):

alltogether <- alltogether%>%
  dplyr::filter((tmin == 0.01 & tmax == 0.99) | 
                (tmin == 0.05 & tmax == 0.95) |
                (tmin == 0.1 & tmax == 0.9) |
                (tmin == 0.2 & tmax == 0.8) | 
                (tmin == 0.25 & tmax == 0.75))

Upvotes: 2

Views: 502

Answers (1)

Yannis Vassiliadis
Yannis Vassiliadis

Reputation: 1709

OK, here's a solution to both problems. Unfortunately, I couldn't get one using mapply so I had to rely on a good old for loop (but it's still faster, given that it doesn't have to do all the extra calculations). Also, I changed the function to give you the names of the variables as you wanted. The biggest difference is that I'm not using expand.grid but merge. Finally, it incorporates your comment from above.

des <- function(var, tmin, tmax, cor.var, cor.method = c("spearman", "pearson", "kendall")){
  var[var < quantile(var, probs = tmin, na.rm = TRUE) |
        var > quantile(var, probs = tmax, na.rm = TRUE)] <- NA
  d <- psych::describe(var)
  correlation<- cor(cor.var, var, use="pairwise.complete", match.arg(cor.method))
  df <- cbind(variable = names(var), tmin = tmin, tmax = tmax, d, correlation)
  names(df)[length(names(df))]<- paste0("correlation_with_", names(cor.var))  
  print(df)
}

minmax = data.frame(tmin = c(0.01, 0.05, 0.1, 0.2, 0.25), tmax = c(0.99, 0.95, 0.9, 0.8, 0.75))
args<- merge(c("Var2", "Var4", "Var5"), minmax)
args[,1]<- as.character(args[,1])

    alltogether<- NULL
for (i in 1:nrow(args)){
    alltogether<- rbind(alltogether, des(var = dataframe[args[i,1]], 
                       tmin = args[i, 2], tmax=args[i, 3], cor.var = dataframe["Var1"]))
}

Upvotes: 1

Related Questions