Anakin Skywalker
Anakin Skywalker

Reputation: 2520

Splitting a column into date/time in tidyverse

I acknowledge that there are multiple similar questions, but I did not find so far an answer, which works for me, maybe because of AM/PM. I would like not to remove the latter.

I have a column Trip Start Timestamp, which looks like

12/01/2019 12:30:00 AM
12/01/2019 12:31:00 AM
12/01/2019 12:32:00 AM 

I am trying to remove AM/PM and split into two variables Start date and Start time.

Ideal output:

Start date Start time
12/01/2019 12:30:00 AM

I would like to read it as time series.

My best guess so far

Date <- format(as.POSIXct(strptime(taxi_2020$`Trip Start Timestamp`, "%d/%m/%Y %H:%M:S")), format = "%m/%d/%Y")
Time <- format(as.POSIXct(strptime(taxi_2020$`Trip Start Timestamp`, "%d/%m/%Y %H:%M:S")), format = "%H:%M:S")
head(Date)
head(Time)

Gives me

[1] NA NA NA NA NA NA
[1] NA NA NA NA NA NA

Update

It looks like columns with time and date has some problems with format.

dput can be found here

So far the solution by @Ronak Shah works. Technically I finally separated the data, but probably identified another problem with UTF-8

Upvotes: 1

Views: 1105

Answers (4)

Ian Campbell
Ian Campbell

Reputation: 24790

Since you tagged this with tidyverse, here's a simple approach with lubridate:

library(dplyr)
library(lubridate)
data %>%
  mutate(Date = as.Date(mdy_hms(`Trip Start Timestamp`)), 
         Time = format(mdy_hms(`Trip Start Timestamp`), "%I:%M:%S %p"))
#    Trip Start Timestamp       Date        Time
#1 12/01/2019 12:30:00 AM 2019-12-01 12:30:00 AM
#2 12/01/2019 12:31:00 AM 2019-12-01 12:31:00 AM
#3 12/01/2019 12:32:00 AM 2019-12-01 12:32:00 AM

Example Data

data <- structure(list(`Trip Start Timestamp` = c("12/01/2019 12:30:00 AM", 
"12/01/2019 12:31:00 AM", "12/01/2019 12:32:00 AM")), class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 3

akrun
akrun

Reputation: 887118

We can use base R to split

out <- do.call(rbind.data.frame, strsplit(data[[1]],
        "(?<=[0-9]) (?=[0-9])", perl = TRUE))
names(out) <- c('Start Date', 'Start Time')
out
#  Start Date  Start Time
#1 12/01/2019 12:30:00 AM
#2 12/01/2019 12:31:00 AM
#3 12/01/2019 12:32:00 AM

data

data <- structure(list(`Trip Start Timestamp` = c("12/01/2019 12:30:00 AM", 
"12/01/2019 12:31:00 AM", "12/01/2019 12:32:00 AM")), class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

Note that keeping the output in the format that you have shown would return columns as characters/factors.

You can split the data on whitespace using tidyr::separate

tidyr::separate(data, `Trip Start Timestamp`, c('Start Date', 'Start Time'), 
                       sep = ' ', extra = 'merge')

#  Start Date  Start Time
#1 12/01/2019 12:30:00 AM
#2 12/01/2019 12:31:00 AM
#3 12/01/2019 12:32:00 AM

Similarly, you can use extract :

tidyr::extract(data, `Trip Start Timestamp`, c('Start Date', 'Start Time'), 
               regex = '(.*?)\\s(.*)')

Upvotes: 2

r2evans
r2evans

Reputation: 160447

tidyverse

dat %>%
  mutate(
    ## option 1
    psx = as.POSIXct(V1, format = "%m/%d/%Y %I:%M:%S %p", tz = "UTC"), 
    Date = format(psx, format = "%m/%d/%Y"), 
    Time = format(psx, format = "%I:%M:%S %p"), 
    ## option 2
    Date2 = sub("\\s.*", "", V1), 
    Time2 = sub("^\\S*\\s", "", V1)
  )
#                        V1                 psx       Date        Time      Date2        Time2
# 1  12/01/2019 12:30:00 AM 2019-12-01 12:30:00 12/01/2019 12:30:00 PM 12/01/2019  12:30:00 AM
# 2  12/01/2019 12:31:00 AM 2019-12-01 12:31:00 12/01/2019 12:31:00 PM 12/01/2019  12:31:00 AM
# 3 12/01/2019 12:32:00 AM  2019-12-01 12:32:00 12/01/2019 12:32:00 PM 12/01/2019 12:32:00 AM 

Data

dat <- read.table(header = FALSE, text = "
12/01/2019 12:30:00 AM
12/01/2019 12:31:00 AM
12/01/2019 12:32:00 AM ", sep = "|")

Upvotes: 2

Related Questions