Reputation: 313
I have a data frame with a transaction date txn_date
and 6 more date columns (date_1
, date_2
, etc) that can take values after or before the txn_date
for a given observation.
I need to take the max of the date_
s for a given observation but only taking the max amongst the dates that are less than the txn_date
.
Date frame looks like this (all columns as.Date in the actual date frame but just numeric here for ease)
dat1 <- read.table(text = "txn_date date_0 date_1 date_2 date_3 date_4 date_5
50 30 40 50 60 70 75
60 30 40 50 60 70 75", header = TRUE)
And ideally it would look like this:
dat1 <- read.table(text = "txn_date date_0 date_1 date_2 date_3 date_4 date_5 max
50 30 40 50 60 70 75 40
60 30 40 50 60 70 75 50", header = TRUE)
Upvotes: 1
Views: 133
Reputation: 101099
Another base R option using max.col
dat1$Max <- dat1[-1][
cbind(
1:nrow(dat1),
max.col(dat1[-1] < dat1[, 1], ties.method = "last")
)
]
gives
> dat1
txn_date date_0 date_1 date_2 date_3 date_4 date_5 Max
1 50 30 40 50 60 70 75 40
2 60 30 40 50 60 70 75 50
dat1$max <- apply(dat1, 1, function(x) max(x[-1][x[-1] < x[1]], na.rm = TRUE))
dat1$max
[1] 40 50
If there are rows that doesn't meet the condition, use an if/else
apply(dat1, 1, function(x) if(any(x[-1] < x[1])) max(x[-1][x[-1] < x[1]], na.rm = TRUE) else NA_real_)
[1] 40 50
We could use apply in base R
dat1$max <- apply(dat1, 1, function(x) max(x[-1][x[-1] < x[1]], na.rm = TRUE))
dat1$max
[1] 40 50
If there are rows that doesn't meet the condition, use an if/else
apply(dat1, 1, function(x) if(any(x[-1] < x[1])) max(x[-1][x[-1] < x[1]], na.rm = TRUE) else NA_real_)
[1] 40 50
Or use a much faster version of apply with dapply (from collapse)
library(collapse)
dapply(dat1, MARGIN = 1, FUN = function(x)
max(x[-1][x[-1] < x[1]], na.rm = TRUE))
#[1] 40 50
Or make it faster with a vectorized option
do.call(pmax, c(replace(dat1[-1], dat1[-1] >= dat1[[1]], NA), na.rm = TRUE))
[1] 40 50
Or using tidyverse
library(dplyr)
library(purrr)
dat1 %>%
mutate(max = pmap_dbl(., ~ {
tmp <- c(...)
max(tmp[-1][tmp[-1] < tmp[1]], na.rm = TRUE)}))
-outupt
txn_date date_0 date_1 date_2 date_3 date_4 date_5 max
1 50 30 40 50 60 70 75 40
2 60 30 40 50 60 70 75 50
Upvotes: 3
Reputation: 9858
We can use dplyr
and purrr
with rowwise()
and max
, keep
ing only columns with values < txn_date to feed to max
library(dplyr)
library(purrr)
dat1 %>% rowwise() %>%
mutate(max=max(keep(c_across(date_0:date_5), ~ .x < txn_date), na.rm=TRUE))
# A tibble: 2 x 8
# Rowwise:
txn_date date_0 date_1 date_2 date_3 date_4 date_5 max
<int> <int> <int> <int> <int> <int> <int> <int>
1 50 30 40 50 60 70 75 40
2 60 30 40 50 60 70 75 50
Upvotes: 1