Reputation: 1580
This seems like a simple enough function to write, but I think I'm misunderstanding the requirements for formal arguments / how R parses and evaluates a function.
I'm trying to write a function that converts any character vector of the form "%m/%d/%Y"
(and belonging to data.frame df
) to a date vector, and formats it as "%m/%d/%Y"
, as follows:
dateformat <- function(x) {
df$x <- (format(as.Date(df$x, format = "%m/%d/%Y"), "%m/%d/%Y"))
}
I was thinking that...
dateformat(a)
... would just take the "a"
as the actual argument for x
and plug it into the function, thus resolving as:
df$a <- (format(as.Date(df$a, format = "%m/%d/%Y"), "%m/%d/%Y"))
However, I get the following error when running dateformat(a)
:
Error in as.Date.default(df$x, format = "%m/%d/%Y") :
do not know how to convert 'df$x' to class “Date”
Can someone please explain why my understanding of formal/actual arguments and/or R function parsing/evaluation is incorrect? Thank you.
Update
Of course, for all the variables I want to convert to dates (e.g., df$a
, df$b
, df$c
), I could just write
df$a <- (format(as.Date(df$a, format = "%m/%d/%Y"), "%m/%d/%Y"))
df$b <- (format(as.Date(df$b, format = "%m/%d/%Y"), "%m/%d/%Y"))
df$c <- (format(as.Date(df$c, format = "%m/%d/%Y"), "%m/%d/%Y"))
But I'm looking to improve my coding skills by making a more general function to which I could feed a vector of variables. For instance, what if I had df$a
to df$z
, all character variables that I wanted to convert to date variables? After I write a proper function, I'd like to then perhaps run it like so:
for (n in letters) {
dateformat(n)
}
Upvotes: 0
Views: 1590
Reputation: 59355
First, the format(...)
function returns a character vector, not a date, so if x
is a string,
format(as.Date(x, format = "%m/%d/%Y"), "%m/%d/%Y")
converts x
to date and then back to character, as in:
result <- format(as.Date("01/03/2014", format = "%m/%d/%Y"), "%m/%d/%Y")
result
# [1] "01/03/2014"
class(result)
# [1] "character"
Second, referencing an object, such as df
, in a function, on the LHS of an expression, causes R to create that object in the scope of the function.
a <- 2
f <- function(x) a <- x
f(3)
a
# [1] 2
Here, we set a variable, a
, to 2
. Then in the function we create a new variable, a
in the scope of the function, set it to x
(3), and destroy it when the function returns. So in the global environment a
is still 2
.
If you insist on using a dateformat(...)
function, this should work work:
df <- data.frame(a=paste("01",1:10,"2014",sep="/"),
b=paste("02",11:20,"2014",sep="/"),
c=paste("03",21:30,"2014",sep="/"))
dateformat <- function(x) as.Date(df[[x]], format = "%m/%d/%Y")
for (n in letters[1:3]) df[[n]] <- dateformat(n)
sapply(df,class)
# a b c
# "Date" "Date" "Date"
This will be more efficient though:
df <- as.data.frame(lapply(df,as.Date,format="%m/%d/%Y"))
Upvotes: 1