Reputation: 533
I have following two sample dataframes:
df1 <- data.frame(EVI_GT=c(0.23, 0.54, 0.36, 0.92), EVI_GNT=c(0.33, 0.65, 0.42, 0.73), EVI_GGT=c(0.43, 0.34, 0.22, 0.98))
df2 <- data.frame(T_ET_GT=c(0.56, 0.23, 0.95, 0.82), T_ET_GNT=c(0.10, 0.74, 0.36, 0.35), T_ET_GGT=c(0.52, 0.31, 0.65, 0.58))
I have to extract values from df2 corresponding to min and max of df1 (each row). For example, min (max) value of first row in df1 is 0.23 (0.43) i.e., column 1 (column 3) so the values that should be extracted from df2 will be 0.56 and 0.52 for the first row. Similar for row 2 and so on. Below is my desired output dataframe:
df3 <- data.frame(column1=c(0.56, 0.31, 0.65, 0.35), column2=c(0.52, 0.74, 0.36, 0.58))
How can we get df3 from df2 using conditions on df1?
Upvotes: 3
Views: 1114
Reputation: 4487
An approach using purrr
library(dplyr)
library(purrr)
df1 %>%
# list of row for df1
pmap(~c(...)) %>%
map2_dfr(.y = df2 %>% pmap(~c(...)), # map with list of row df2
.f = function(a, b) { # function that take min/max each row of df1 and extract df2
min_index <- which.min(a)
max_index <- which.max(a)
tibble(min = b[min_index], max = b[max_index])
})
# Output
# A tibble: 4 x 2
min max
<dbl> <dbl>
1 0.56 0.52
2 0.31 0.74
3 0.65 0.36
4 0.35 0.580
Upvotes: 2
Reputation: 389012
You can use which.min
and which.max
to get index of minimum and maximum value respectively. Use apply
to perform rowwise operation and subset the data from df2
.
data.frame(column1 = df2[cbind(1:nrow(df1), apply(df1, 1, which.min))],
column2 = df2[cbind(1:nrow(df1), apply(df1, 1, which.max))])
# column1 column2
#1 0.56 0.52
#2 0.31 0.74
#3 0.65 0.36
#4 0.35 0.58
Upvotes: 4
Reputation: 1751
Assuming your dataframes have the same dimensions, that should be fairly easy!
A very intuitive and simple way would be looping for the number of rows in df1
(or df2
) and finding the column which elements are max and min for every row in df1, thus using that information to subset df2 and attribute that value to df3.
df3 <- data.frame(
min = NA,
max = NA
)
for (i in seq_len(nrow(df1))) {
max_val <- which.max(df1[i, ])
min_val <- which.min(df1[i, ])
df3[i, 1] <- df2[i, min_val]
df3[i, 2] <- df2[i, max_val]
}
A more "dynamic" way of doing that would be extracting the "which.max" and "which.min" from df1 row by row (through an apply statement), thus forming a list of indexes. Then, one could define a matrix of row,col pairs (think of it as coordinates!) for the first and second conditions (min and max values).
indexes <- apply(df1, MARGIN = 1, function(x) {
return(list(min_idx = which.min(x), max_idx = which.max(x)))
})
indexes <- dplyr::bind_rows(indexes)
indexes$row <- 1:nrow(indexes)
mins_indexes <- as.matrix(dplyr::select(indexes, c("row", "min_idx")))
maxes_indexes <- as.matrix(dplyr::select(indexes, c("row", "max_idx")))
df3 <- data.frame(
min_vals = df2[mins_indexes],
max_vals = df2[maxes_indexes]
)
This solution is loosely based on this problem Selecting specific elements from a matrix all at once!
NOTE: I've made the process as intuitive as possible, you could certainly use more clever names and maybe use less lines of code.
Upvotes: 2