Reputation: 2520
I acknowledge that there are multiple similar questions, but I did not find so far an answer, which works for me, maybe because of AM/PM. I would like not to remove the latter.
I have a column Trip Start Timestamp
, which looks like
12/01/2019 12:30:00 AM
12/01/2019 12:31:00 AM
12/01/2019 12:32:00 AM
I am trying to remove AM/PM and split into two variables Start date
and Start time
.
Ideal output:
Start date Start time
12/01/2019 12:30:00 AM
I would like to read it as time series.
My best guess so far
Date <- format(as.POSIXct(strptime(taxi_2020$`Trip Start Timestamp`, "%d/%m/%Y %H:%M:S")), format = "%m/%d/%Y")
Time <- format(as.POSIXct(strptime(taxi_2020$`Trip Start Timestamp`, "%d/%m/%Y %H:%M:S")), format = "%H:%M:S")
head(Date)
head(Time)
Gives me
[1] NA NA NA NA NA NA
[1] NA NA NA NA NA NA
Update
It looks like columns with time and date has some problems with format.
dput
can be found here
So far the solution by @Ronak Shah works. Technically I finally separated the data, but probably identified another problem with UTF-8
Upvotes: 1
Views: 1105
Reputation: 24790
Since you tagged this with tidyverse
, here's a simple approach with lubridate
:
library(dplyr)
library(lubridate)
data %>%
mutate(Date = as.Date(mdy_hms(`Trip Start Timestamp`)),
Time = format(mdy_hms(`Trip Start Timestamp`), "%I:%M:%S %p"))
# Trip Start Timestamp Date Time
#1 12/01/2019 12:30:00 AM 2019-12-01 12:30:00 AM
#2 12/01/2019 12:31:00 AM 2019-12-01 12:31:00 AM
#3 12/01/2019 12:32:00 AM 2019-12-01 12:32:00 AM
Example Data
data <- structure(list(`Trip Start Timestamp` = c("12/01/2019 12:30:00 AM",
"12/01/2019 12:31:00 AM", "12/01/2019 12:32:00 AM")), class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 3
Reputation: 887118
We can use base R
to split
out <- do.call(rbind.data.frame, strsplit(data[[1]],
"(?<=[0-9]) (?=[0-9])", perl = TRUE))
names(out) <- c('Start Date', 'Start Time')
out
# Start Date Start Time
#1 12/01/2019 12:30:00 AM
#2 12/01/2019 12:31:00 AM
#3 12/01/2019 12:32:00 AM
data <- structure(list(`Trip Start Timestamp` = c("12/01/2019 12:30:00 AM",
"12/01/2019 12:31:00 AM", "12/01/2019 12:32:00 AM")), class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 1
Reputation: 388982
Note that keeping the output in the format that you have shown would return columns as characters/factors.
You can split the data on whitespace using tidyr::separate
tidyr::separate(data, `Trip Start Timestamp`, c('Start Date', 'Start Time'),
sep = ' ', extra = 'merge')
# Start Date Start Time
#1 12/01/2019 12:30:00 AM
#2 12/01/2019 12:31:00 AM
#3 12/01/2019 12:32:00 AM
Similarly, you can use extract
:
tidyr::extract(data, `Trip Start Timestamp`, c('Start Date', 'Start Time'),
regex = '(.*?)\\s(.*)')
Upvotes: 2
Reputation: 160447
dat %>%
mutate(
## option 1
psx = as.POSIXct(V1, format = "%m/%d/%Y %I:%M:%S %p", tz = "UTC"),
Date = format(psx, format = "%m/%d/%Y"),
Time = format(psx, format = "%I:%M:%S %p"),
## option 2
Date2 = sub("\\s.*", "", V1),
Time2 = sub("^\\S*\\s", "", V1)
)
# V1 psx Date Time Date2 Time2
# 1 12/01/2019 12:30:00 AM 2019-12-01 12:30:00 12/01/2019 12:30:00 PM 12/01/2019 12:30:00 AM
# 2 12/01/2019 12:31:00 AM 2019-12-01 12:31:00 12/01/2019 12:31:00 PM 12/01/2019 12:31:00 AM
# 3 12/01/2019 12:32:00 AM 2019-12-01 12:32:00 12/01/2019 12:32:00 PM 12/01/2019 12:32:00 AM
Data
dat <- read.table(header = FALSE, text = "
12/01/2019 12:30:00 AM
12/01/2019 12:31:00 AM
12/01/2019 12:32:00 AM ", sep = "|")
Upvotes: 2