pseudorandom
pseudorandom

Reputation: 267

How to convert a date column with different character formats in R

In R version 3.6.0, I want to standardize a character column to a date format. The column has dates in three different forms like below:

time <- c("12/31/2019", "2020-01-01 00:00:10.555", "1517075645.917")
t <- data.frame(time)

I can see "1517075645.917" is a unix timestamp:

as.POSIXct(1517075645.917, origin = "1970-01-01", tz = "UTC")
# [1] "2018-01-27 17:54:05 UTC"

I can standardize the first and second element as below:

# version 1.5.6
library(lubridate)

parse_date_time(c("12/31/2019", "2020-01-01 00:00:10.555"), c("mdy", "ymd HMS"))
# [1] "2019-12-31 00:00:00 UTC" "2020-01-01 00:00:10 UTC

How do I standardize the whole column including the character "1517075645.917" as I cannot find a parsing format like "mdy" or "ymd HMS" for it?

Upvotes: 2

Views: 149

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389325

You can use if_else and apply the function based on pattern in the data.

library(dplyr)
library(lubridate)

t %>%
  mutate(time1 = suppressWarnings(if_else(grepl('^\\d+\\.\\d+$', time), 
              as.POSIXct(as.numeric(time), origin = "1970-01-01", tz = "UTC"), 
              parse_date_time(time, c("mdy", "ymd HMS")))))

#                     time               time1
#1              12/31/2019 2019-12-31 00:00:00
#2 2020-01-01 00:00:10.555 2020-01-01 00:00:10
#3          1517075645.917 2018-01-27 17:54:05

Upvotes: 0

Karthik S
Karthik S

Reputation: 11546

Does this work:

library(anytime)
library(stringr)
library(dplyr)
t %>% 
   mutate(time1 = case_when(str_detect(time, '\\d{10}\\.\\d{3}') ~  time, TRUE ~ NA_character_  )) %>% 
   mutate(time1 = as.POSIXct(as.numeric(time1), origin = '1970-01-01', tz = 'UTC')) %>% 
   mutate(time = anytime(time)) %>% mutate(time = coalesce(time, time1)) %>% select(-time1)
                 time
1 2019-12-31 00:00:00
2 2020-01-01 00:00:10
3 2018-01-27 23:24:05
> 

Upvotes: 1

Related Questions