Sam Pickwick
Sam Pickwick

Reputation: 313

Take max with condition

I have a data frame with a transaction date txn_date and 6 more date columns (date_1, date_2, etc) that can take values after or before the txn_date for a given observation.

I need to take the max of the date_ s for a given observation but only taking the max amongst the dates that are less than the txn_date.

Date frame looks like this (all columns as.Date in the actual date frame but just numeric here for ease)

dat1 <- read.table(text = "txn_date date_0 date_1 date_2 date_3 date_4 date_5 
50 30 40 50 60 70 75
60 30 40 50 60 70 75", header = TRUE)

And ideally it would look like this:

dat1 <- read.table(text = "txn_date date_0 date_1 date_2 date_3 date_4 date_5 max
50 30 40 50 60 70 75 40
60 30 40 50 60 70 75 50", header = TRUE)

Upvotes: 1

Views: 133

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101099

Another base R option using max.col

dat1$Max <- dat1[-1][
  cbind(
    1:nrow(dat1),
    max.col(dat1[-1] < dat1[, 1], ties.method = "last")
  )
]

gives

> dat1
  txn_date date_0 date_1 date_2 date_3 date_4 date_5 Max
1       50     30     40     50     60     70     75  40
2       60     30     40     50     60     70     75  50

dat1$max <- apply(dat1, 1, function(x) max(x[-1][x[-1] < x[1]], na.rm = TRUE))
dat1$max
[1] 40 50

If there are rows that doesn't meet the condition, use an if/else

apply(dat1, 1, function(x) if(any(x[-1] < x[1])) max(x[-1][x[-1] < x[1]], na.rm = TRUE) else NA_real_)
[1] 40 50

We could use apply in base R

dat1$max <- apply(dat1, 1, function(x) max(x[-1][x[-1] < x[1]], na.rm = TRUE))
dat1$max
[1] 40 50

If there are rows that doesn't meet the condition, use an if/else

apply(dat1, 1, function(x) if(any(x[-1] < x[1])) max(x[-1][x[-1] < x[1]], na.rm = TRUE) else NA_real_)
[1] 40 50

Or use a much faster version of apply with dapply (from collapse)

library(collapse)
dapply(dat1, MARGIN = 1, FUN = function(x)
       max(x[-1][x[-1] < x[1]], na.rm = TRUE))
#[1] 40 50

Or make it faster with a vectorized option

do.call(pmax, c(replace(dat1[-1], dat1[-1] >= dat1[[1]], NA), na.rm = TRUE))
[1] 40 50

Or using tidyverse

library(dplyr)
library(purrr)
dat1 %>%
    mutate(max = pmap_dbl(., ~ {
                 tmp <- c(...)
                 max(tmp[-1][tmp[-1] < tmp[1]], na.rm = TRUE)}))

-outupt

   txn_date date_0 date_1 date_2 date_3 date_4 date_5 max
1       50     30     40     50     60     70     75  40
2       60     30     40     50     60     70     75  50

Upvotes: 3

GuedesBF
GuedesBF

Reputation: 9858

We can use dplyr and purrr with rowwise()and max, keeping only columns with values < txn_date to feed to max

library(dplyr)
library(purrr)

dat1 %>% rowwise() %>%
        mutate(max=max(keep(c_across(date_0:date_5), ~ .x < txn_date), na.rm=TRUE))

# A tibble: 2 x 8
# Rowwise: 
  txn_date date_0 date_1 date_2 date_3 date_4 date_5   max
     <int>  <int>  <int>  <int>  <int>  <int>  <int> <int>
1       50     30     40     50     60     70     75    40
2       60     30     40     50     60     70     75    50

Upvotes: 1

Related Questions