redbaron1981
redbaron1981

Reputation: 417

Fill missing date with previous entry

I have a character object containing dates that looks like this;

Dec-13-17 06:38AM
Dec-11-17 07:30AM
06:35AM
06:20AM  
06:15AM 
Dec-09-17 01:22PM
Dec-08-17 11:46AM 
05:20AM 
Dec-08-17 09:38AM

Problem I have is that a few of these entries are missing the date value. The 6:35, 6:20 & 5:15 are all missing Dec-11-17 and the 05:20AM is missing Dec-08-17.

I wrote quite a lengthy ifelse statement that would always look back to the next complete date by using str_count() to find the next complete datetime then using gsub to extract the date only part then paste it with the existing time entry like this;

paste0(gsub( " .*$", "", datetime), <existingdate>)

Its quite messy as it contains lots of nested ifelse statements but the problem is that I never know how many rows above the incomplete entry I have to look to extract the date from. As this data is scraped from the web my nested if else statements could be very long to accommodate this and I am sure there is a better way of doing this.

Any help much appreciated!

Upvotes: 2

Views: 90

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269624

Assuming that the input is s below, extract the date returning it or NA if none giving dat. Also extract the time giving tim. Run na.locf on the date to fill in the missing dates and paste the date and time together.

s <- c("Dec-13-17 06:38AM", "Dec-11-17 07:30AM", "06:35AM", "06:20AM", 
  "06:15AM", "Dec-09-17 01:22PM", "Dec-08-17 11:46AM", "05:20AM", 
  "Dec-08-17 09:38AM")

library(zoo)
dat <-  ifelse(grepl("-", s), sub(" .*", "", s), NA)
tim = sub(".* ", "", s)
dattim <- paste(na.locf(dat), tim)
as.POSIXct(dattim, format = "%b-%d-%y %I:%M%p")

giving:

[1] "2017-12-13 06:38:00 EST" "2017-12-11 07:30:00 EST"
[3] "2017-12-11 06:35:00 EST" "2017-12-11 06:20:00 EST"
[5] "2017-12-11 06:15:00 EST" "2017-12-09 13:22:00 EST"
[7] "2017-12-08 11:46:00 EST" "2017-12-08 05:20:00 EST"
[9] "2017-12-08 09:38:00 EST"

Upvotes: 2

pogibas
pogibas

Reputation: 28339

First get NA for entries without MDY information. Then use tidyr::fill to fill those NAs with previous entries.

# Using OPs data stored in df (column V1)
foo <- ifelse(grepl("^[A-z]", df$V1), sub(" .*", "", df$V1), NA)
# [1] "Dec-13-17" "Dec-11-17" NA          NA          NA          "Dec-09-17"
# [7] "Dec-08-17" NA          "Dec-08-17"

library(dplyr)
library(tidyr)
data.frame(foo) %>% 
    fill(foo) %>% 
    mutate(foo = paste(foo, sub(".* ", "", df$V1)))

#                 foo
# 1 Dec-13-17 06:38AM
# 2 Dec-11-17 07:30AM
# 3 Dec-11-17 06:35AM
# 4 Dec-11-17 06:20AM
# 5 Dec-11-17 06:15AM
# 6 Dec-09-17 01:22PM
# 7 Dec-08-17 11:46AM
# 8 Dec-08-17 05:20AM
# 9 Dec-08-17 09:38AM

Upvotes: 1

Related Questions