raghav
raghav

Reputation: 533

Extract values from a DataFrame based on condition on another DataFrame in R

I have following two sample dataframes:

df1 <- data.frame(EVI_GT=c(0.23, 0.54, 0.36, 0.92), EVI_GNT=c(0.33, 0.65, 0.42, 0.73), EVI_GGT=c(0.43, 0.34, 0.22, 0.98))
df2 <- data.frame(T_ET_GT=c(0.56, 0.23, 0.95, 0.82), T_ET_GNT=c(0.10, 0.74, 0.36, 0.35), T_ET_GGT=c(0.52, 0.31, 0.65, 0.58))

I have to extract values from df2 corresponding to min and max of df1 (each row). For example, min (max) value of first row in df1 is 0.23 (0.43) i.e., column 1 (column 3) so the values that should be extracted from df2 will be 0.56 and 0.52 for the first row. Similar for row 2 and so on. Below is my desired output dataframe:

df3 <- data.frame(column1=c(0.56, 0.31, 0.65, 0.35), column2=c(0.52, 0.74, 0.36, 0.58))

How can we get df3 from df2 using conditions on df1?

Upvotes: 3

Views: 1114

Answers (3)

Sinh Nguyen
Sinh Nguyen

Reputation: 4487

An approach using purrr

library(dplyr)
library(purrr)

df1 %>%
  # list of row for df1
  pmap(~c(...)) %>%
  map2_dfr(.y = df2 %>% pmap(~c(...)), # map with list of row df2
    .f = function(a, b) { # function that take min/max each row of df1 and extract df2
      min_index <- which.min(a)
      max_index <- which.max(a)
      tibble(min = b[min_index], max = b[max_index])
    })

# Output
# A tibble: 4 x 2
    min   max
  <dbl> <dbl>
1  0.56 0.52 
2  0.31 0.74 
3  0.65 0.36 
4  0.35 0.580

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389012

You can use which.min and which.max to get index of minimum and maximum value respectively. Use apply to perform rowwise operation and subset the data from df2.

data.frame(column1 = df2[cbind(1:nrow(df1), apply(df1, 1, which.min))],
           column2 = df2[cbind(1:nrow(df1), apply(df1, 1, which.max))])

#  column1 column2
#1    0.56    0.52
#2    0.31    0.74
#3    0.65    0.36
#4    0.35    0.58

Upvotes: 4

eduardokapp
eduardokapp

Reputation: 1751

Assuming your dataframes have the same dimensions, that should be fairly easy!

A very intuitive and simple way would be looping for the number of rows in df1 (or df2) and finding the column which elements are max and min for every row in df1, thus using that information to subset df2 and attribute that value to df3.

df3 <- data.frame(
  min = NA,
  max = NA
)

for (i in seq_len(nrow(df1))) {
  max_val <- which.max(df1[i, ])
  min_val <- which.min(df1[i, ])
  df3[i, 1] <- df2[i, min_val]
  df3[i, 2] <- df2[i, max_val]
}

A more "dynamic" way of doing that would be extracting the "which.max" and "which.min" from df1 row by row (through an apply statement), thus forming a list of indexes. Then, one could define a matrix of row,col pairs (think of it as coordinates!) for the first and second conditions (min and max values).

indexes <- apply(df1, MARGIN = 1, function(x) {
  return(list(min_idx = which.min(x), max_idx = which.max(x)))
})

indexes <- dplyr::bind_rows(indexes)
indexes$row <- 1:nrow(indexes)
mins_indexes <- as.matrix(dplyr::select(indexes, c("row", "min_idx")))
maxes_indexes <- as.matrix(dplyr::select(indexes, c("row", "max_idx")))

df3 <- data.frame(
  min_vals = df2[mins_indexes],
  max_vals = df2[maxes_indexes]
)

This solution is loosely based on this problem Selecting specific elements from a matrix all at once!

NOTE: I've made the process as intuitive as possible, you could certainly use more clever names and maybe use less lines of code.

Upvotes: 2

Related Questions