Reputation: 17577
I'm trying to compute a function of date components (year, month, day). I am reading data into a data frame, parsing the strings into dates, and then, I hope, doing some arithmetic on the components of the dates.
Here is my data file:
timestamp,value
"2014-01-23 12:30:00",123
"2015-11-30 15:45:00",456
"2016-07-29 09:15:00",789
Here is my R session: (I am working with R 3.0.2 on Ubuntu 14.04)
> x <- read.csv ("foo.csv", row.names=NULL, header=T)
> x
timestamp value
1 2014-01-23 12:30:00 123
2 2015-11-30 15:45:00 456
3 2016-07-29 09:15:00 789
> x1 <- as.vector (x[, 1])
> x1
[1] "2014-01-23 12:30:00" "2015-11-30 15:45:00" "2016-07-29 09:15:00"
> x1.t <- strptime (x1, "%Y-%m-%d %H:%M:%S")
> x1.t
[1] "2014-01-23 12:30:00" "2015-11-30 15:45:00" "2016-07-29 09:15:00"
> x1.t.combo <- sapply (x1.t, function (t) { (t$year - 114)*12 + (t$mon + 1) })
Error in t$year : $ operator is invalid for atomic vectors
Applying the $
to elements of x1.t
seems to work as expected, e.g. (x1.t[1]$year - 114)*12 + (x1.t[1]$mon + 1)
yields 1
. What is causing the error message?
I find that (x1.t$year - 114)*12 + (x1.t$mon + 1)
yields 1 23 31
as expected, so I guess it's not really necessary to figure out the business with sapply
, but I'd still like to know, in the interest of understanding what's going on.
Upvotes: 2
Views: 129
Reputation: 263301
Both sapply
and lapply
yield the same error because x1.t
is a list and they are passing the elements in the list one by one. The first one is a 3-element (atomic, not recursive) vector of seconds
> x1.t[[1]] # same as x1.t[['sec']]
[1] 0 0 0
....and furthermore it (and all the other components) are being passed with no name. So even the year element which is the 5th or 6th list will still not have a name of 'year' by the time it gets to the body of that anonymous function.
dput(x1.t)
structure(list(sec = c(0, 0, 0), min = c(30L, 45L, 15L), hour = c(12L,
15L, 9L), mday = c(23L, 30L, 29L), mon = c(0L, 10L, 6L), year = 114:116,
wday = c(4L, 1L, 5L), yday = c(22L, 333L, 210L), isdst = c(0L,
0L, 1L), zone = c("PST", "PST", "PDT"), gmtoff = c(NA_integer_,
NA_integer_, NA_integer_)), .Names = c("sec", "min", "hour",
"mday", "mon", "year", "wday", "yday", "isdst", "zone", "gmtoff"
), class = c("POSIXlt", "POSIXt"))
This is akin to the error people make when they think that the first element in a data.frame is the first row or that the length of a data.frame is the number of cases (when it is actually the number of columns.)
Upvotes: 2
Reputation: 1929
The problem you're running into is that POSIXlt itself is a type with multiple elements, so the *apply commands apply the function to each element of it. You can see the elements by unlist(x1.t)
.
So you have to "go around" it. There's a simple way, where you don't have to convert it first:
> x <- c("2014-01-23 12:30:00", "2015-11-30 15:45:00")
> x
[1] "2014-01-23 12:30:00" "2015-11-30 15:45:00"
> y <- sapply (x, function (t) { t <- as.POSIXlt(t); (t$year - 114)*12 + (t$mon + 1) })
> y
2014-01-23 12:30:00 2015-11-30 15:45:00
1 23
But if you really want to convert it first, then you have to turn it into either numeric or character first and then once again convert back inside the function. Something like this:
> x <- c(strptime("2014-01-23 12:30:00", "%Y-%m-%d %H:%M:%S"), strptime("2015-11-30 15:45:00", "%Y-%m-%d %H:%M:%S"))
> x
[1] "2014-01-23 12:30:00 EET" "2015-11-30 15:45:00 EET"
> y <- sapply (as.numeric(x), function (t) { t <- as.POSIXlt(t, origin = "1970-01-01"); (t$year - 114)*12 + (t$mon + 1) })
> y
[1] 1 23
Upvotes: 2