Reputation: 417
I have a character object containing dates that looks like this;
Dec-13-17 06:38AM
Dec-11-17 07:30AM
06:35AM
06:20AM
06:15AM
Dec-09-17 01:22PM
Dec-08-17 11:46AM
05:20AM
Dec-08-17 09:38AM
Problem I have is that a few of these entries are missing the date value. The 6:35, 6:20 & 5:15 are all missing Dec-11-17 and the 05:20AM is missing Dec-08-17.
I wrote quite a lengthy ifelse statement that would always look back to the next complete date by using str_count()
to find the next complete datetime then using gsub to extract the date only part then paste it with the existing time entry like this;
paste0(gsub( " .*$", "", datetime), <existingdate>)
Its quite messy as it contains lots of nested ifelse statements but the problem is that I never know how many rows above the incomplete entry I have to look to extract the date from. As this data is scraped from the web my nested if else statements could be very long to accommodate this and I am sure there is a better way of doing this.
Any help much appreciated!
Upvotes: 2
Views: 90
Reputation: 269624
Assuming that the input is s
below, extract the date returning it or NA if none giving dat
. Also extract the time giving tim
. Run na.locf
on the date to fill in the missing dates and paste the date and time together.
s <- c("Dec-13-17 06:38AM", "Dec-11-17 07:30AM", "06:35AM", "06:20AM",
"06:15AM", "Dec-09-17 01:22PM", "Dec-08-17 11:46AM", "05:20AM",
"Dec-08-17 09:38AM")
library(zoo)
dat <- ifelse(grepl("-", s), sub(" .*", "", s), NA)
tim = sub(".* ", "", s)
dattim <- paste(na.locf(dat), tim)
as.POSIXct(dattim, format = "%b-%d-%y %I:%M%p")
giving:
[1] "2017-12-13 06:38:00 EST" "2017-12-11 07:30:00 EST"
[3] "2017-12-11 06:35:00 EST" "2017-12-11 06:20:00 EST"
[5] "2017-12-11 06:15:00 EST" "2017-12-09 13:22:00 EST"
[7] "2017-12-08 11:46:00 EST" "2017-12-08 05:20:00 EST"
[9] "2017-12-08 09:38:00 EST"
Upvotes: 2
Reputation: 28339
First get NA
for entries without MDY information. Then use tidyr::fill
to fill those NA
s with previous entries.
# Using OPs data stored in df (column V1)
foo <- ifelse(grepl("^[A-z]", df$V1), sub(" .*", "", df$V1), NA)
# [1] "Dec-13-17" "Dec-11-17" NA NA NA "Dec-09-17"
# [7] "Dec-08-17" NA "Dec-08-17"
library(dplyr)
library(tidyr)
data.frame(foo) %>%
fill(foo) %>%
mutate(foo = paste(foo, sub(".* ", "", df$V1)))
# foo
# 1 Dec-13-17 06:38AM
# 2 Dec-11-17 07:30AM
# 3 Dec-11-17 06:35AM
# 4 Dec-11-17 06:20AM
# 5 Dec-11-17 06:15AM
# 6 Dec-09-17 01:22PM
# 7 Dec-08-17 11:46AM
# 8 Dec-08-17 05:20AM
# 9 Dec-08-17 09:38AM
Upvotes: 1