Reputation: 73
I have a data.table with about 100k rows. I am going to simplify this to only 3 columns because that is all that is relevant here.
dt <- data.table(indicator = c("x", "y"),
date1 = c("20190111", "20190212", "20190512", "20190723"),
date2 = c("20190105", "20190215", "20190616", "20190623"))
What I want to do is assign either date1 or date2 to a new column, 'final_date' depending on the indicator column. If indicator is "x" assign final_date as date1. If indicator "y" assign final_date as date2.
I am able to do this with a "for" loop and if/else statements, but it takes a few minutes to complete with 100k rows.
for (row in 1:nrow(dt)) {
if(dt$indicator[row] == "x") {
dt$final_date[row] <- dt$date1[row]
} else {
dt$final_date[row] <- dt$date2[row]
}
}
Is there any more efficient way to do this with data.table functionality or anything else?
Upvotes: 1
Views: 56
Reputation: 481
Try this:
# necessary package
library(dplyr)
library(data.table)
# reproduce your data
dt <- data.table(
indicator = c("x", "y"),
date1 = c("20190111", "20190212", "20190512", "20190723"),
date2 = c("20190105", "20190215", "20190616", "20190623")
)
# create your variable final_date
dt[, final_date := case_when(indicator == "x" ~ date1,
TRUE ~ date2)]
Hope it helps
Upvotes: 0
Reputation: 1844
With data.table
, I would do something like this:
dt[, final_date := ifelse(indicator == "x", date1, date2)]
Really quick and simple! I suspect with a large set of data it will be faster than dplyr
as well as the solution you have, as data.table
mutates in place rather than creating a copy of the data.
Upvotes: 3
Reputation: 2467
With the dplyr
pipeline
> dt%>%mutate(final_data=if_else(indicator=="x",date1,date2))
indicator date1 date2 final_data
1 x 20190111 20190105 20190111
2 y 20190212 20190215 20190215
3 x 20190512 20190616 20190512
4 y 20190723 20190623 20190623
Upvotes: 0