Reputation: 93
Suppose I have a data.frame
, I wish to create a new column called duration
, it is calculated only for records where status = Active
, using 2016-12-10
as today's date, so that duration = today - start_date
.
What's the best approach for this conditional calculation?
status <- c("Active", "Inactive", "Active")
date <- c("2016-10-25", "2015-05-11", "2015-3-18")
start_date <- as.Date(date, format = "%Y-%m-%d")
data.frame(status, start_date)
Upvotes: 0
Views: 1691
Reputation: 2496
using dplyr
, you can try:
dft %>%
dplyr::mutate(duration = ifelse(status == "Active", (today - start_date), NA))
where dft
is your initial dataframe.
Upvotes: 0
Reputation: 886938
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), create the logical index in 'i' and assign (:=
) the difference between 'today' and 'start_date' as the 'duration' column. This will be efficient as it assigns in place
library(data.table)
setDT(df1)[status == "Active", duration := today - start_date]
df1
# status start_date duration
#1: Active 2016-10-25 46 days
#2: Inactive 2015-05-11 NA days
#3: Active 2015-03-18 633 days
Or a base R
option is
i1 <- df1$status == "Active"
df1[i1, "duration"] <- today - df1$start_date[i1]
where
today <- as.Date("2016-12-10")
Upvotes: 3