Reputation: 111
My question builds on another one previously posted by someone: mapply for all arguments' combinations [R]
I want to apply a function to multiple arguments using mapply, and this works with my code below. But I want to add a condition such that NOT ALL tmin- and tmax- values will be combined, instead only the first tmin with the first tmax, the second tmin with the second tmax (if tmin == 0.01 & tmax == 0.99 or if tmin == 0.05 & tmax == 0.95, but e.g. tmin == 0.01 should not be combined with tmax == 0.95). But the first elements of tmin & tmax should be combined with ALL variables, all second elements of tmin & tmax should be combined with ALL variables, etc (as below in the expand.grid() function).
In the end I should have a data frame as the one called "alltogether", but I should have 15 rows with the described condition and not 75 as it is the case now.
I could just filter rows with dplyr::filter afterwards, but is there a nice way to include this condition in the function?
Here an example data frame:
dataframe <- data.frame(personID = 1:10,
Var1 = c(4, 6, 3, 3, 7, 1, 20, NA, 12, 2),
Var2 = c(5, 4, 5, 6, 9, 14, 14, 1, 0, NA),
Var3 = c(NA, 15, 12, 0, NA, NA, 2, 7, 6, 7),
Var4 = c(0, 0, 0, 0, 1, 0, 1, 4, 2, 1),
Var5 = c(12, 15, 11, 10, 10, 15, NA, 10, 13, 11))
and here the code I have so far:
des <- function(var, tmin, tmax){
v <- var[var >= quantile(var, probs = tmin, na.rm = TRUE) &
var <= quantile(var, probs = tmax, na.rm = TRUE)]
d <- psych::describe(v)
df <- cbind(variable = deparse(substitute(var)), tmin = tmin, tmax = tmax, d)
print(df)
}
args = expand.grid(var = dataframe[, c("Var2", "Var4", "Var5")], tmin = c(0.01, 0.05, 0.1, 0.2, 0.25), tmax = c(0.99, 0.95, 0.9, 0.8, 0.75))
alltogether <- do.call("rbind", mapply(FUN = des, var = args$var, tmin = args$tmin, tmax = args$tmax, SIMPLIFY = FALSE))
Thank you for helping!
Edit:
The expected output is the one after filtering the "alltogether"-dataframe with the following code (15 obs. of 16 variables):
alltogether <- alltogether%>%
dplyr::filter((tmin == 0.01 & tmax == 0.99) |
(tmin == 0.05 & tmax == 0.95) |
(tmin == 0.1 & tmax == 0.9) |
(tmin == 0.2 & tmax == 0.8) |
(tmin == 0.25 & tmax == 0.75))
Upvotes: 2
Views: 502
Reputation: 1709
OK, here's a solution to both problems. Unfortunately, I couldn't get one using mapply
so I had to rely on a good old for
loop (but it's still faster, given that it doesn't have to do all the extra calculations). Also, I changed the function to give you the names of the variables as you wanted. The biggest difference is that I'm not using expand.grid
but merge
. Finally, it incorporates your comment from above.
des <- function(var, tmin, tmax, cor.var, cor.method = c("spearman", "pearson", "kendall")){
var[var < quantile(var, probs = tmin, na.rm = TRUE) |
var > quantile(var, probs = tmax, na.rm = TRUE)] <- NA
d <- psych::describe(var)
correlation<- cor(cor.var, var, use="pairwise.complete", match.arg(cor.method))
df <- cbind(variable = names(var), tmin = tmin, tmax = tmax, d, correlation)
names(df)[length(names(df))]<- paste0("correlation_with_", names(cor.var))
print(df)
}
minmax = data.frame(tmin = c(0.01, 0.05, 0.1, 0.2, 0.25), tmax = c(0.99, 0.95, 0.9, 0.8, 0.75))
args<- merge(c("Var2", "Var4", "Var5"), minmax)
args[,1]<- as.character(args[,1])
alltogether<- NULL
for (i in 1:nrow(args)){
alltogether<- rbind(alltogether, des(var = dataframe[args[i,1]],
tmin = args[i, 2], tmax=args[i, 3], cor.var = dataframe["Var1"]))
}
Upvotes: 1