Reputation: 361
I have got a large dataframe containing medical data (my.medical.data
).
A number of columns contain dates (e.g. hospital admission date), the names of each of these columns end in "_date".
I would like to apply the lubridate::dmy()
function to the columns that contain dates and overwrite my original dataframe with the output of this function.
It would be great to have a general solution that can be applied using any function, not just my dmy()
example.
Essentially, I want to apply the following to all of my date columns:
my.medical.data$admission_date <- lubridate::dmy(my.medical.data$admission_date)
my.medical.data$operation_date <- lubridate::dmy(my.medical.data$operation_date)
etc.
I've tried this:
date.columns <- select(ICB, ends_with("_date"))
date.names <- names(date.columns)
date.columns <- transmute_at(my.medical.data, date.names, lubridate::dmy)
Now date.columns
contains my date columns, in the "Date" format, rather than the original factors. Now I want to replace the date columns in my.medical.data
with the new columns in the correct format.
my.medical.data.new <- full_join(x = my.medical.data, y = date.columns)
Now I get:
Error: cannot join a Date object with an object that is not a Date object
I'm a bit of an R novice, but I suspect that there is an easier way to do this (e.g. process the original dataframe directly), or maybe a correct way to join / merge the two dataframes.
Upvotes: 0
Views: 922
Reputation: 6226
Several ways to do that.
If you work with voluminous data, I think data.table
is the best approach (will bring you flexibility, speed and memory efficiency)
You can use the :=
(update by reference operator) together with lapplỳ
to apply lubridate::ymd
to all columns defined in .SDcols
dimension
library(data.table)
setDT(my.medical.data)
cols_to_change <- endsWith("_date", colnames(my.medical.date))
my.medical.data[, c(cols_to_change) := lapply(.SD, lubridate::ymd), .SDcols = cols_to_change]
A standard lapply
can also help. You could try something like that (I did not test it)
my.medical.data[, cols_to_change] <- lapply(cols_to_change, function(d) lubridate::ymd(my.medical.data[,d]))
Upvotes: 1
Reputation: 13680
As usual it's difficult to answer without an example dataset, but this should do the work:
library(dplyr)
my.medical.data <- my.medical.data %>%
mutate_at(vars(ends_with('_date')), lubridate::dmy)
This will mutate in place each variable that end with '_date', applying the function. It can also apply multiple functions. See ?mutate_at
(which is also the help for mutate_if
)
Upvotes: 2