Reputation: 3354
This is a pseudo followup to this question: Why is ggplot graphing null percentage data points?
Let's say this is my dataset:
Date AE AA AEF Percent
1/1/2012 1211 1000 3556 0.03
1/2/2012 100 2000 3221 0.43
1/3/2012 3423 10000 2343 0.54
1/4/2012 10000 3000 332 0.43
1/5/2012 2342 500 4435 0.43
1/6/2012 2342 800 2342 0.23
1/7/2012 2342 1500 1231 0.12
1/8/2012 111 2300 333
1/9/2012 1231 1313 3433
1/10/2012 3453 5654 222
1/11/2012 3453 3453 454
1/12/2012 5654 7685 3452
> str(data)
'data.frame': 12 obs. of 5 variables:
$ Date : Factor w/ 12 levels "10/11/2012","10/12/2012",..: 1 2 3 4 5 6 7 8 9 10 ...
$ AE : int 1211 100 3423 10000 2342 2342 2342 111 1231 3453 ...
$ AA : int 1000 2000 10000 3000 500 800 1500 2300 1313 5654 ...
$ AEF : int 3556 3221 2343 332 4435 2342 1231 333 3433 222 ...
$ Percent: num 0.03 0.43 0.54 0.43 0.43 0.23 0.12 NA NA NA ...
I need something to tell that the 'Date' column is a Date type as opposed to a numeric or character type (this is because I have to convert the 'Date' column of the data input into an actual Date with as.Date(), ASSSUMING that I do not know the column names of the data set).
is.numeric(data[[1]]) returns False
is.character(data[[1]]) returns False
I made the 'Date' column in Excel, formatting the column in the 'Date' format, then saved the file as a csv. What type is this in R? I seek an expression similar to the above that returns TRUE.
Upvotes: 42
Views: 72472
Reputation: 21
I know this is an old question and that I am not providing a self-developed answer, but perhaps some non-experts in R (like myself) could find useful to use the skim function (of the skimr package) to check whether one or more variables of a data frame (df) are in the Date format.
The syntax is just: skim (df) or skimr::skim(df), if one does not want to load the package for just doing this check.
The obtained output is a very detailed summary of the data frame in which variables are grouped by format (character, Date, numeric...) and additional info is provided (e.g. whether there are missing values, descriptive statistics, etc).
Upvotes: 0
Reputation: 5229
The OP clearly asks for just a check:
I need something to tell that the 'Date' column is a Date type
So how many date classes come with R? Exactly two: Date
and POSIXt
(excluding their derivatives like POSIXct
and POSIXlt
).
So we can just check on that, and make it more robust than the answers already given:
is.Date <- function(x) {
inherits(x, c("Date", "POSIXt"))
}
As robust as it gets.
is.Date(as.Date("2020-02-02"))
#> [1] TRUE
is.Date(as.POSIXct("2020-02-02"))
#> [1] TRUE
is.Date(as.POSIXlt("2020-02-02"))
#> [1] TRUE
If you want to know if a column would be successfully tranformable/coercible to a Date type, then that's another question. This is as requested for: 'to tell that [...] is a Date type'.
Upvotes: 8
Reputation: 850
I will refer to a simple example and I hope it can be generalized. Say that you have a date
d1<-Sys.Date()
d1
"2020-02-12"
deparse(d1)
"structure(18304, class = \"Date\")"
Thus
grep("Date",deparse(d1))>=1
TRUE
alternatively use
class(d1)
"Date"
Upvotes: 2
Reputation: 105
This is my way of doing it. Works most of the time but needs improvement
MissLt <- function(x, ratio = 0.5){
sum(is.na(x))/length(x) < ratio
}
IS.Date <- function(x, addformat = NULL, exactformat = NULL){
if (is.null(exactformat)){
format = c("%m/%d/%Y", "%m-%d-%Y","%Y/%m/%d" ,"%Y-%m-%d", addformat)
y <- as.Date(as.character(x),format= format)
MissLt(y,ratio = 1-(1/length(y)))}
else{
y <- as.Date(as.character(x),format= exactformat)
MissLt(y,ratio = 1-(1/length(y)))}
}
sapply(data, IS.Date)
Upvotes: -1
Reputation: 1871
I know this question is old, but I did want to mention that there is now a function in the lubridate
package for is.Date
and also is.POSIXt
sapply(list(as.Date('2000-01-01'), 123, 'ABC'), is.Date)
[1] TRUE FALSE FALSE
Upvotes: 21
Reputation: 75
Function that i created based on answers here, and using now
is.Date <- function(date) {
if (sapply(date, function(x)
! all(is.na(as.Date(
as.character(x),
format = c("%d/%m/%Y", "%d-%m-%Y", "%Y/%m/%d", "%Y-%m-%d")
))))) {
return(TRUE)
} else{
return(FALSE)
}
}
Upvotes: 2
Reputation: 161
To work with dates I use a function to identify if the strings are dates, and if they are, convert them to a predefined format (in this case I choose ''%d/%m/%Y'):
standarDates <- function(string) {
patterns = c('[0-9][0-9][0-9][0-9]/[0-9][0-9]/[0-9][0-9]','[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]','[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]')
formatdates = c('%Y/%m/%d','%d/%m/%Y','%Y-%m-%d')
standardformat='%d/%m/%Y'
for(i in 1:3){
if(grepl(patterns[i], string)){
aux=as.Date(string,format=formatdates[i])
if(!is.na(aux)){
return(format(aux, standardformat))
}
}
}
return(FALSE)
}
Suppose you have the vector
a=c("2018-24-16","1587/03/16","fhjfmk","9885/04/16")
> sapply(a,standarDates)
2018-24-16 1587/03/16 fhjfmk 9885/04/16
"FALSE" "16/03/1587" "FALSE" "16/04/9885"
with the command
"FALSE"%in%sapply(a,standarDates)
[1] True
you can figure out if all the elements are dates.
The advantage of this function is that you can add more patterns and date formats according to the data with you are working and end with a standard format also for all those dates. (The disadvantage is that it isn't exactly what the question is asking)
I hope this helps
Upvotes: 4
Reputation: 5237
Use inherits
to detect if argument has datatype Date
:
is.date <- function(x) inherits(x, 'Date')
sapply(list(as.Date('2000-01-01'), 123, 'ABC'), is.date)
#[1] TRUE FALSE FALSE
If you want to check if character argument can be converted to Date
then use this:
is.convertible.to.date <- function(x) !is.na(as.Date(as.character(x), tz = 'UTC', format = '%Y-%m-%d'))
sapply(list('2000-01-01', 123, 'ABC'), is.convertible.to.date)
# [1] TRUE FALSE FALSE
Upvotes: 43
Reputation: 93938
You could try to coerce all the columns to as.Date
and see which ones succeed. You would need to specify the format you expect dates to be in though. E.g.:
data <- data.frame(
Date=c("10/11/2012","10/12/2012"),
AE=c(1211,100),
Percent=c(0.03,0.43)
)
sapply(data, function(x) !all(is.na(as.Date(as.character(x),format="%d/%m/%Y"))))
#Date AE Percent
#TRUE FALSE FALSE
Upvotes: 19