Reputation:
I'm trying to work on a .csv file of Water height and date. The date column comes in this format "2007-03-15T18:54:00Z". I've tried using regex to remove the 'T' and the 'Z' so I can manipulate the time for visualization but I keep getting NA in all of my entries.
df <- fread("./IrishNationalTideGalway.csv",select = c("time (UTC)","Water_Level_LAT (metres)"))
data <- df[c(918121:994130)] #2008-2009 subset of data
colnames(data)[1] <- "time"
colnames(data)[2] <- "height"
data$time <- as.POSIXct( data$time , format = "%Y/%m/%d %I:%M:%S" , tz = "GMT")
I'm unsure how to get rid of the T and Z and then also how to put it into a format that I can manipulate.
Upvotes: 0
Views: 137
Reputation: 887138
We could convert to Datetime with lubridate
and then apply the as.Date
library(dplyr)
df %>%
mutate(DATE_2 = as.Date(lubridate::ymd_hms(DATE_1)))
Upvotes: 2
Reputation: 1364
Here is a simple example to solve your problem
df <- data.frame(OBS = 1:2,DATE_1 = c("2007-03-15T18:54:00Z", "2008-03-15T18:54:00Z"))
df2 <- df %>%
mutate(DATE_2 = as.Date(stri_replace_all(DATE_1, regex = "T+(?:[01]\\d|2[0-3]):(?:[0-5]\\d):(?:[0-5]\\d)| (?:[01]\\d|2[0-3]):(?:[0-5]\\d):(?:[0-5]\\d)", "", perl = TRUE, ignore.case = TRUE)))
df2
# OBS DATE_1 DATE_2
# 1 1 2007-03-15T18:54:00Z 2007-03-15
# 2 2 2008-03-15T18:54:00Z 2008-03-15
OR if you just want to remove T and Z only, please try this
df3 <- df %>%
mutate(DATE_2 = str_replace_all(DATE_1, regex("T|Z"), " ")) %>%
mutate(DATE_2 = str_trim(DATE_2, side = c("right")))
# OBS DATE_1 DATE_2
# 1 1 2007-03-15T18:54:00Z 2007-03-15 18:54:00
# 2 2 2008-03-15T18:54:00Z 2008-03-15 18:54:00
Upvotes: 0