Reputation: 353
I think I have a problem related to \
that I fail to handle.
Here is an excerpt from a DateTime column of a data.frame I have read with read_csv
:
earthquakes[1:20,1]
Source: local data frame [20 x 1]
DateTime
(chr)
1 1964/01/01 12:21:55.40
2 1964/01/01 14:16:27.60
3 1964/01/01 14:18:53.90
4 1964/01/01 15:49:47.90
5 1964/01/01 17:26:43.50
My goal is to extract the years here. Manully doing
> format(strptime(c("1964/01/01 12:21:55.40","1964/01/01 12:21:55.40","1964/01/01 14:16:27.60"), "%Y/%m/%d %H:%M:%OS"), "%Y")
[1] "1964" "1964" "1964"
works as intended. However,
> strptime(earthquakes[1:5,1], "%Y/%m/%d %H:%M:%OS")
DateTime
NA
My hunch is that the problem is related to
as.character(earthquakes[1:5,1])
[1] "c(\"1964/01/01 12:21:55.40\", \"1964/01/01 14:16:27.60\", \"1964/01/01 14:18:53.90\", \"1964/01/01 15:49:47.90\", \"1964/01/01 17:26:43.50\")"
So, that the column in the data frame does also contain the " via the escape \"
. But I do not know how to handle this from here.
Given that the years are the first four entries, it would also seem OK (but less elegant, imho) to do
substr(earthquakes[1:5,1],1,4)
but that then accordingly just gives
[1] "c(\"1"
Clearly, I could do
substr(earthquakes[1:5,1],4,7)
but that would only work for the first row.
Upvotes: 1
Views: 365
Reputation: 70266
Apparently you have a dplyr::tbl_df
and by default in those, [
never simplifies a single column to an atomic vector (in contrast to [
applied to a base R data.frame
). Hence, you could use either [[
or $
to extract the column which will then be simplified to atomic vector.
Some examples:
data(iris)
library(dplyr)
x <- tbl_df(iris)
x[1:5, 1]
#Source: local data frame [5 x 1]
#
# Sepal.Length
# (dbl)
#1 5.1
#2 4.9
#3 4.7
#4 4.6
#5 5.0
iris[1:5, 1]
#[1] 5.1 4.9 4.7 4.6 5.0
x[[1]][1:5]
#[1] 5.1 4.9 4.7 4.6 5.0
x$Sepal.Length[1:5]
#[1] 5.1 4.9 4.7 4.6 5.0
Upvotes: 3