Rime
Rime

Reputation: 53

How to convert irregular times into XTS object using R

I have the following data.frame that I would like to covert into an xts() object but have been breaking my head trying to figure out how to format the times:

data.frame

The data is arranged from recent (at the top) to oldest (at the bottom). The problem is that every row is not consistent with the format so I am having trouble trying to format it in a way that each row will display the correct date & time.

Desired output for Date/Time Column:

01/05/17 02:55 PM
01/05/17 11:40 AM
01/05/17 07:00 AM
12/30/16 05:50 PM
12/29/16 07:03 AM
12/30/16 07:00 AM

DATA:

data <- structure(list(Date = c("Jan-05-17 02:55PM", "11:40AM", "07:00AM", 
"Dec-30-16 05:50PM", "Dec-29-16 07:03AM", "07:00AM"), News = c("ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits  +89.95%", 
"Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday", 
"EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire", 
"Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%", 
"Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)", 
"EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"
), Symbol = c("ETRM", "ETRM", "ETRM", "ETRM", "ETRM", "ETRM")), .Names = c("Date", 
"News", "Symbol"), row.names = c(NA, 6L), class = "data.frame")

Upvotes: 1

Views: 418

Answers (3)

Matt Dancho
Matt Dancho

Reputation: 7308

Here's a solution using the tidyquant package, which loads all packages you need to solve this problem. Same as the other solutions, you need to have a consistent date with structure such as:

"Jan-05-17 02:55 PM"

Using the lubridate package, you can convert to POSIXct class with the mdy_hm() function as follows:

"Jan-05-17 02:55 PM" %>% lubridate::mdy_hm()
> "2017-01-05 14:55:00 UTC"

Where the lubridate::mdy_hm() function stands for month-day-year hour-minute. The output is the date in the correct date-time class.

The tidyquant package has a convenient function, as_xts(), with an argument, date_col that when specified converts the data.frame date column to xts row names. I use the pipe (%>%) to make the code more readable and to show the workflow, and the dplyr::mutate() function which changes the Date column to the POSIXct class using the lubridate::mdy_hm() function. The final workflow looks like this:

data %>%
    mutate(Date = lubridate::mdy_hm(Date)) %>%
    as_xts(date_col = Date)

Make sure the Date column has all rows with a valid format such as "Jan-05-17 02:55 PM" before trying the code snippet, otherwise you will get a parsing error at the lubridate::mdy_hm() function.

Data I used to test is below:

data <- structure(list(Date = c("Jan-05-17 02:55 PM", "Jan-05-17 11:40 AM", "Jan-05-17 07:00 AM", 
                            "Dec-30-16 05:50 PM", "Dec-29-16 07:03 AM", "Dec-29-16 07:00 AM"), News = c("ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits  +89.95%", 
                                                                                           "Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday", 
                                                                                           "EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire", 
                                                                                           "Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%", 
                                                                                           "Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)", 
                                                                                           "EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"
                            ), Symbol = c("ETRM", "ETRM", "ETRM", "ETRM", "ETRM", "ETRM")), .Names = c("Date", 
                                                                                                       "News", "Symbol"), row.names = c(NA, 6L), class = "data.frame")

Upvotes: 1

G. Grothendieck
G. Grothendieck

Reputation: 270348

Use sub to replace a digit at the start of Date with NA followed by space followed by the digit. From that use read.table to create a 2 column data frame with the date (or NA) in column 1 and the time in column 2. Fill in the NA values using na.locf giving DF2. Now cbind DF2 and data[-1] reading the data.frame so created using read.zoo. Finally convert the resulting "zoo" object to "xts".

DF2 <- na.locf(read.table(text = sub("^(\\d)", "NA \\1", data$Date)))
z <- read.zoo(cbind(DF2, data[-1]), index = 1:2, tz = "", format = "%b-%d-%y %I:%M%p")
as.xts(z)

Upvotes: 2

FXQuantTrader
FXQuantTrader

Reputation: 6891

Assuming you have a typo in your last line of your desired date-time output, which I guess you mean is 12/29/16 07:00 AM, then when you have an element in the column Date that is missing a date, take the most recently known date and roll "backwards":

library(stringr)

l_datetime <- str_split(data$Date, " ")
data$ymd <- unlist(lapply(l_datetime, function(x) ifelse(length(x) == 2, x[[1]], NA)))
data$time <- unlist(lapply(l_datetime, function(x) ifelse(length(x) == 2, x[[2]], x[[1]])))
# Roll "backward" the latest known date for elements of column `Date` that have missing YYYY-MM-DD values
data$ymd <- na.locf(data$ymd) 
# Carefully parse the time strings allowing for AM/PM:
psx_date <- as.POSIXct(paste(data$ymd, data$time), format = "%b-%d-%y %I:%M%p")

x_data <- xts(x = data[, c("News", "Symbol")], order.by = psx_date)
# > x_data
#                                                                                                         News                                  Symbol
# 2016-12-29 07:00:00 "EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"                                           "ETRM"
# 2016-12-29 07:03:00 "Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)"                                              "ETRM"
# 2016-12-30 17:50:00 "Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%"                                    "ETRM"
# 2017-01-05 07:00:00 "EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire" "ETRM"
# 2017-01-05 11:40:00 "Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday"                                                       "ETRM"
# 2017-01-05 14:55:00 "ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits  +89.95%"                           "ETRM"

Upvotes: 2

Related Questions