multiporpoise
multiporpoise

Reputation: 1

R nested for loop to fill in blank variables until reaching a specific timestamp

I'm working with a dataset that has timestamps(POSIXct formatted) for every half hour (but there are duplicates for some time chunks, and not the same number of duplicates for each). There are three columns A, B1, B2 that note the location of a ship in a harbor. There are two more columns with the timestamp for arrival and departure of each ship. I want to carry forward the A's, B1's, and B2's until reaching the departure timestamp to actually denote how long the ships are present not just when they arrive.

Here's what the table looks like (with many many more rows... 2 million...)

           [Timestamp]           [A]       [B1]        [B2]         [Arrival]             [Departure]        
      [1,] "2018-04-19 08:00:00" "A"        NA          NA          "2018-04-19 08:00:00" "2018-04-20 06:00:00"
      [2,] "2018-04-29 07:00:00" "A"        NA          NA          "2018-04-29 07:00:00" "2018-04-29 15:00:00"
      [3,] "2018-04-30 08:00:00" "A"        NA          NA          "2018-04-30 08:00:00" "2018-04-30 18:00:00"
      [4,] "2018-05-11 08:00:00" "A"        NA          NA          "2018-05-11 08:00:00" "2018-05-11 17:00:00"
      [5,] "2018-05-14 08:00:00" "A"        NA          NA          "2018-05-14 08:00:00" "2018-05-14 18:00:00"
      [6,] "2018-05-18 08:00:00" "A"        NA          NA          "2018-05-18 08:00:00" "2018-05-18 17:00:00"
      [7,] "2018-05-20 07:00:00" NA         "B1"        NA          "2018-05-20 07:00:00" "2018-05-20 17:00:00"
      [8,] "2018-05-20 08:00:00" "A"        NA          NA          "2018-05-20 08:00:00" "2018-05-20 17:00:00"
      [9,] "2018-05-22 07:00:00" "A"        NA          NA          "2018-05-22 07:00:00" "2018-05-22 22:00:00"
     [10,] "2018-05-27 07:00:00" "A"        NA          NA          "2018-05-27 07:00:00" "2018-05-27 15:00:00"
     [11,] "2018-06-01 00:00:00" NA         NA          NA          NA                    NA                   
     [12,] "2018-06-01 00:30:00" NA         NA          NA          NA                    NA                   
     [13,] "2018-06-01 01:00:00" NA         NA          NA          NA                    NA                   
     [14,] "2018-06-01 01:30:00" NA         NA          NA          NA                    NA                   
     [15,] "2018-06-01 02:00:00" NA         NA          NA          NA                    NA                   
     [16,] "2018-06-01 02:30:00" NA         NA          NA          NA                    NA                   
     [17,] "2018-06-01 03:00:00" NA         NA          NA          NA                    NA                   
     [18,] "2018-06-01 03:30:00" NA         NA          NA          NA                    NA                   
     [19,] "2018-06-01 04:00:00" NA         NA          NA          NA                    NA                   
     [20,] "2018-06-01 04:30:00" NA         NA          NA          NA                    NA                   
     [21,] "2018-06-01 05:00:00" NA         NA          NA          NA                    NA                   
     [22,] "2018-06-01 05:30:00" NA         NA          NA          NA                    NA                   
     [23,] "2018-06-01 06:00:00" NA         NA          NA          NA                    NA                   
     [24,] "2018-06-01 06:30:00" NA         NA          NA          NA                    NA                   
     [25,] "2018-06-01 07:00:00" NA         "B1"        NA          "2018-06-01 07:00:00" "2018-06-01 22:00:00"
     [26,] "2018-06-01 07:30:00" NA          NA         NA          NA                    NA                   
     [27,] "2018-06-01 08:00:00" "A"         NA         NA          "2018-06-01 08:00:00" "2018-06-01 17:00:00"
     [28,] "2018-06-01 08:30:00"  NA         NA         NA          NA                    NA                   
     [29,] "2018-06-01 09:00:00"  NA         NA         NA          NA                    NA                   
     [30,] "2018-06-01 09:30:00"  NA         NA         NA          NA                    NA                   
     [31,] "2018-06-01 10:00:00"  NA         NA         NA          NA                    NA                   
     [32,] "2018-06-01 10:30:00"  NA         NA         NA          NA                    NA                   
     [33,] "2018-06-01 11:00:00"  NA         NA         NA          NA                    NA                   
     [34,] "2018-06-01 11:30:00"  NA         NA         NA          NA                    NA                   
     [35,] "2018-06-01 12:00:00"  NA         NA         NA          NA                    NA                   
     [36,] "2018-06-01 12:30:00"  NA         NA         NA          NA                    NA                   
     [37,] "2018-06-01 13:00:00"  NA         NA         NA          NA                    NA                   
     [38,] "2018-06-01 13:30:00"  NA         NA         NA          NA                    NA                   
     [39,] "2018-06-01 14:00:00"  NA         NA         NA          NA                    NA                   
     [40,] "2018-06-01 14:30:00"  NA         NA         NA          NA                    NA                   
     [41,] "2018-06-01 15:00:00"  NA         NA         NA          NA                    NA                   
     [42,] "2018-06-01 15:30:00"  NA         NA         NA          NA                    NA                   
     [43,] "2018-06-01 16:00:00"  NA         NA         NA          NA                    NA                   
     [44,] "2018-06-01 16:30:00"  NA         NA         NA          NA                    NA                   
     [45,] "2018-06-01 17:00:00"  NA         NA         NA          NA                    NA                   
     [46,] "2018-06-01 17:30:00"  NA         NA         NA          NA                    NA                   
     [47,] "2018-06-01 18:00:00"  NA         NA         NA          NA                    NA                   
     [48,] "2018-06-01 18:30:00"  NA         NA         NA          NA                    NA                   
     [49,] "2018-06-01 19:00:00"  NA         NA         NA          NA                    NA                   
     [50,] "2018-06-01 19:30:00"  NA         NA         NA          NA                    NA                   
     [51,] "2018-06-01 20:00:00"  NA         NA         NA          NA                    NA                   
     [52,] "2018-06-01 20:30:00"  NA         NA         NA          NA                    NA                   
     [53,] "2018-06-01 21:00:00"  NA         NA         NA          NA                    NA                   
     [54,] "2018-06-01 21:30:00"  NA         NA         NA          NA                    NA                   
     [55,] "2018-06-01 22:00:00"  NA         NA         NA          NA                    NA                   
     [56,] "2018-06-01 22:30:00" NA         NA          NA          NA                    NA                   
     [57,] "2018-06-01 23:00:00" NA         NA          NA          NA                    NA 

And this is what I'm looking for:

           [Timestamp]           [A]        [B1]       [B2]        [Arrival]               [Departure]        
      [1,] "2018-04-19 08:00:00" "A"        NA          NA          "2018-04-19 08:00:00" "2018-04-20 06:00:00"
      [2,] "2018-04-29 07:00:00" "A"        NA          NA          "2018-04-29 07:00:00" "2018-04-29 15:00:00"
      [3,] "2018-04-30 08:00:00" "A"        NA          NA          "2018-04-30 08:00:00" "2018-04-30 18:00:00"
      [4,] "2018-05-11 08:00:00" "A"        NA          NA          "2018-05-11 08:00:00" "2018-05-11 17:00:00"
      [5,] "2018-05-14 08:00:00" "A"        NA          NA          "2018-05-14 08:00:00" "2018-05-14 18:00:00"
      [6,] "2018-05-18 08:00:00" "A"        NA          NA          "2018-05-18 08:00:00" "2018-05-18 17:00:00"
      [7,] "2018-05-20 07:00:00" NA         "B1"        NA          "2018-05-20 07:00:00" "2018-05-20 17:00:00"
      [8,] "2018-05-20 08:00:00" "A"        "B1"        NA          "2018-05-20 08:00:00" "2018-05-20 17:00:00"
      [9,] "2018-05-22 07:00:00" "A"        NA          NA          "2018-05-22 07:00:00" "2018-05-22 22:00:00"
     [10,] "2018-05-27 07:00:00" "A"        NA          NA          "2018-05-27 07:00:00" "2018-05-27 15:00:00"
     [11,] "2018-06-01 00:00:00" NA         NA          NA          NA                    NA                   
     [12,] "2018-06-01 00:30:00" NA         NA          NA          NA                    NA                   
     [13,] "2018-06-01 01:00:00" NA         NA          NA          NA                    NA                   
     [14,] "2018-06-01 01:30:00" NA         NA          NA          NA                    NA                   
     [15,] "2018-06-01 02:00:00" NA         NA          NA          NA                    NA                   
     [16,] "2018-06-01 02:30:00" NA         NA          NA          NA                    NA                   
     [17,] "2018-06-01 03:00:00" NA         NA          NA          NA                    NA                   
     [18,] "2018-06-01 03:30:00" NA         NA          NA          NA                    NA                   
     [19,] "2018-06-01 04:00:00" NA         NA          NA          NA                    NA                   
     [20,] "2018-06-01 04:30:00" NA         NA          NA          NA                    NA                   
     [21,] "2018-06-01 05:00:00" NA         NA          NA          NA                    NA                   
     [22,] "2018-06-01 05:30:00" NA         NA          NA          NA                    NA                   
     [23,] "2018-06-01 06:00:00" NA         NA          NA          NA                    NA                   
     [24,] "2018-06-01 06:30:00" NA         NA          NA          NA                    NA                   
     [25,] "2018-06-01 07:00:00" NA         "B1"        NA          "2018-06-01 07:00:00" "2018-06-01 22:00:00"
     [26,] "2018-06-01 07:30:00" NA         "B1"        NA          NA                    NA                   
     [27,] "2018-06-01 08:00:00" "A"        "B1"        NA          "2018-06-01 08:00:00" "2018-06-01 17:00:00"
     [28,] "2018-06-01 08:30:00" "A"        "B1"        NA          NA                    NA                   
     [29,] "2018-06-01 09:00:00" "A"        "B1"        NA          NA                    NA                   
     [30,] "2018-06-01 09:30:00" "A"        "B1"        NA          NA                    NA                   
     [31,] "2018-06-01 10:00:00" "A"        "B1"        NA          NA                    NA                   
     [32,] "2018-06-01 10:30:00" "A"        "B1"        NA          NA                    NA                   
     [33,] "2018-06-01 11:00:00" "A"        "B1"        NA          NA                    NA                   
     [34,] "2018-06-01 11:30:00" "A"        "B1"        NA          NA                    NA                   
     [35,] "2018-06-01 12:00:00" "A"        "B1"        NA          NA                    NA                   
     [36,] "2018-06-01 12:30:00" "A"        "B1"        NA          NA                    NA                   
     [37,] "2018-06-01 13:00:00" "A"        "B1"        NA          NA                    NA                   
     [38,] "2018-06-01 13:30:00" "A"        "B1"        NA          NA                    NA                   
     [39,] "2018-06-01 14:00:00" "A"        "B1"        NA          NA                    NA                   
     [40,] "2018-06-01 14:30:00" "A"        "B1"        NA          NA                    NA                   
     [41,] "2018-06-01 15:00:00" "A"        "B1"        NA          NA                    NA                   
     [42,] "2018-06-01 15:30:00" "A"        "B1"        NA          NA                    NA                   
     [43,] "2018-06-01 16:00:00" "A"        "B1"        NA          NA                    NA                   
     [44,] "2018-06-01 16:30:00" "A"        "B1"        NA          NA                    NA                   
     [45,] "2018-06-01 17:00:00" "A"        "B1"        NA          NA                    NA                   
     [46,] "2018-06-01 17:30:00" NA         "B1"        NA          NA                    NA                   
     [47,] "2018-06-01 18:00:00" NA         "B1"        NA          NA                    NA                   
     [48,] "2018-06-01 18:30:00" NA         "B1"        NA          NA                    NA                   
     [49,] "2018-06-01 19:00:00" NA         "B1"        NA          NA                    NA                   
     [50,] "2018-06-01 19:30:00" NA         "B1"        NA          NA                    NA                   
     [51,] "2018-06-01 20:00:00" NA         "B1"        NA          NA                    NA                   
     [52,] "2018-06-01 20:30:00" NA         "B1"        NA          NA                    NA                   
     [53,] "2018-06-01 21:00:00" NA         "B1"        NA          NA                    NA                   
     [54,] "2018-06-01 21:30:00" NA         "B1"        NA          NA                    NA                   
     [55,] "2018-06-01 22:00:00" NA         "B1"        NA          NA                    NA                   
     [56,] "2018-06-01 22:30:00" NA          NA         NA          NA                    NA                   
     [57,] "2018-06-01 23:00:00" NA          NA         NA          NA                    NA 

Here's the data:

dput(loopdata[1:57,])
structure(list(FinalTimestamp = structure(c(1524142800, 1525003200, 
1525093200, 1526043600, 1526302800, 1526648400, 1526817600, 1526821200, 
1526990400, 1527422400, 1527829200, 1527831000, 1527832800, 1527834600, 
1527836400, 1527838200, 1527840000, 1527841800, 1527843600, 1527845400, 
1527847200, 1527849000, 1527850800, 1527852600, 1527854400, 1527856200, 
1527858000, 1527859800, 1527861600, 1527863400, 1527865200, 1527867000, 
1527868800, 1527870600, 1527872400, 1527874200, 1527876000, 1527877800, 
1527879600, 1527881400, 1527883200, 1527885000, 1527886800, 1527888600, 
1527890400, 1527892200, 1527894000, 1527895800, 1527897600, 1527899400, 
1527901200, 1527903000, 1527904800, 1527906600, 1527908400, 1527910200, 
1527912000), class = c("POSIXct", "POSIXt"), tzone = "EST"), 
    AnchorageA = c("A", "A", "A", "A", "A", "A", NA, "A", "A", 
    "A", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, "A", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), AnchorageB1 = c(NA, NA, NA, NA, NA, 
    NA, "B1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, "B1", NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA), AnchorageB2 = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    ), TimestFormArr = structure(c(1524139200, 1524999600, 1525089600, 
    1526040000, 1526299200, 1526644800, 1526814000, 1526817600, 
    1526986800, 1527418800, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, 1527850800, NA, 1527854400, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), class = c("POSIXct", 
    "POSIXt"), tzone = ""), TimestFormDep = structure(c(1524218400, 
    1525028400, 1525125600, 1526072400, 1526335200, 1526677200, 
    1526850000, 1526850000, 1527040800, 1527447600, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1527904800, NA, 
    1527886800, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("FinalTimestamp", 
"AnchorageA", "AnchorageB1", "AnchorageB2", "TimestFormArr", 
"TimestFormDep"), row.names = c(NA, 57L), class = "data.frame")

Right now this is what I have for code to try to achieve this:

lastdate = 1
for(i in 1:length(loopdata$Timestamp))  
{
  if(i%%1000==0) print(i)
  if(!is.na(loopdata$Arrival[i]))
  {lastdate=i}
  if(loopdata$Timestamp[i] >= loopdata$Arrival[lastdate] & 
     loopdata$Timestamp[i] <= loopdata$Departure[lastdate])
  {loopdata[i,2:4]=loopdata[lastdate,2:4]}
}

The above code runs but it doesn't work. I usually stop it after 5,000 rows to check it (hence the print(i)) and there are no error messages. It carries forward the A's but it won't carry forward the B1's or B2's. Is this because it resets again because the first B1 is so quickly followed by an A? I've also tried using the same loop code with specific arrival and departure timestamps for B1 and B2 with an error message (Help! Thank you!

Upvotes: 0

Views: 38

Answers (1)

Uwe
Uwe

Reputation: 42564

If I understand correctly, OP's dataset contains timestamps of arrival and departure of ships in a harbour and where the ships have berthed. Apparently, the original data have already been reshaped to wide format (with a separate column for each anchorage) and expanded to a 30 minute time raster. Now, the OP is struggeling to populate the anchorage columns to denote how long the ships are present.

My answer consists of 3 parts:

  1. Retrieve arrival and departure times and anchorage for each visit of a ship.
  2. Expand these data to 30 minutes time raster.
  3. Show a compacted version (only overlapping periods of ship's visits unsing foverlaps()).

Ship's visits

library(data.table)
time_cols <- c("Arrival", "Departure")
ships_in_harbour <- melt(
  setDT(loopdata), id.var = time_cols, 
  measure.vars = patterns("Anchorage"), na.rm = TRUE,
  value.name = "Anchorage")[
    , variable := NULL][
      # set time zone to EST for all time columns
      , (time_cols) := lapply(.SD, lubridate::force_tz, "EST"), .SDcols = time_cols]

ships_in_harbour[]
                Arrival           Departure Anchorage id
 1: 2018-04-19 14:00:00 2018-04-20 12:00:00         A  1
 2: 2018-04-29 13:00:00 2018-04-29 21:00:00         A  2
 3: 2018-04-30 14:00:00 2018-05-01 00:00:00         A  3
 4: 2018-05-11 14:00:00 2018-05-11 23:00:00         A  4
 5: 2018-05-14 14:00:00 2018-05-15 00:00:00         A  5
 6: 2018-05-18 14:00:00 2018-05-18 23:00:00         A  6
 7: 2018-05-20 14:00:00 2018-05-20 23:00:00         A  7
 8: 2018-05-22 13:00:00 2018-05-23 04:00:00         A  8
 9: 2018-05-27 13:00:00 2018-05-27 21:00:00         A  9
10: 2018-06-01 14:00:00 2018-06-01 23:00:00         A 10
11: 2018-05-20 13:00:00 2018-05-20 23:00:00        B1 11
12: 2018-06-01 13:00:00 2018-06-02 04:00:00        B1 12

Each row of this dataset denotes a visit of a ship.

It can be visualised by

library(ggplot2)
ggplot(ships_in_harbour) + 
  aes(x = Arrival, xend = Departure, y = Anchorage, yend = Anchorage) +
  geom_segment()

enter image description here

Note that there is an overlap, i.e., two ships visiting the harbour at the same time on 2018-05-20 (rows 7 and 11) and 2018-06-01 (rows 10 and 12).

Expand to 30 minutes raster

library(magrittr)   # piping used to improve readability
result <- ships_in_harbour[
  , c(.(time = seq(lubridate::ceiling_date(Arrival, "30 min"),
                   lubridate::floor_date(Departure, "30 min"),
                   by = "30 min")), .SD), 
  by = .(id = seq_len(nrow(ships_in_harbour)))] %>% 
  # reshape to wide format
  dcast(time ~ Anchorage, toString, value.var = "id")

result    
                    time A B1
  1: 2018-04-19 14:00:00 1   
  2: 2018-04-19 14:30:00 1   
  3: 2018-04-19 15:00:00 1   
  4: 2018-04-19 15:30:00 1   
  5: 2018-04-19 16:00:00 1   
 ---                         
238: 2018-06-02 02:00:00   12
239: 2018-06-02 02:30:00   12
240: 2018-06-02 03:00:00   12
241: 2018-06-02 03:30:00   12
242: 2018-06-02 04:00:00   12

This is a rectangular table of time versus anchorage where the contents of each cell indicate whether there was a ship berthed at a particular anchorage. In addition to OP's expected result, the id of the visi is given.

However, due to the fixed interval of 30 minutes the relevant information is difficult to find in the bulk of data.

tail(result, 22L)
                   time  A B1
 1: 2018-06-01 17:30:00 10 12
 2: 2018-06-01 18:00:00 10 12
 3: 2018-06-01 18:30:00 10 12
 4: 2018-06-01 19:00:00 10 12
 5: 2018-06-01 19:30:00 10 12
 6: 2018-06-01 20:00:00 10 12
 7: 2018-06-01 20:30:00 10 12
 8: 2018-06-01 21:00:00 10 12
 9: 2018-06-01 21:30:00 10 12
10: 2018-06-01 22:00:00 10 12
11: 2018-06-01 22:30:00 10 12
12: 2018-06-01 23:00:00 10 12
13: 2018-06-01 23:30:00    12
14: 2018-06-02 00:00:00    12
15: 2018-06-02 00:30:00    12
16: 2018-06-02 01:00:00    12
17: 2018-06-02 01:30:00    12
18: 2018-06-02 02:00:00    12
19: 2018-06-02 02:30:00    12
20: 2018-06-02 03:00:00    12
21: 2018-06-02 03:30:00    12
22: 2018-06-02 04:00:00    12
                   time  A B1

Overlapping periods

A more compact view can be created by finding the overlaps of all ship's visits and reshaping to wide format:

ships_in_harbour[, id := .I]
time_of_event <-  ships_in_harbour[, c(Arrival, Departure) %>% sort() %>% unique]
foverlaps(
  ships_in_harbour, 
  data.table(start = head(time_of_event, -1L), 
             end = tail(time_of_event, -1L) - lubridate::seconds(0), 
             key = "start,end"), by.x = time_cols) %>% 
  .[, .(start = pmax(start, Arrival), end = pmin(end, Departure), id, Anchorage)] %>% 
  .[start < end] %>% 
  dcast(., start + end ~ Anchorage, toString, value.var = "id")
                  start                 end  A B1
 1: 2018-04-19 14:00:00 2018-04-20 12:00:00  1   
 2: 2018-04-29 13:00:00 2018-04-29 21:00:00  2   
 3: 2018-04-30 14:00:00 2018-05-01 00:00:00  3   
 4: 2018-05-11 14:00:00 2018-05-11 23:00:00  4   
 5: 2018-05-14 14:00:00 2018-05-15 00:00:00  5   
 6: 2018-05-18 14:00:00 2018-05-18 23:00:00  6   
 7: 2018-05-20 13:00:00 2018-05-20 14:00:00    11
 8: 2018-05-20 14:00:00 2018-05-20 23:00:00  7 11
 9: 2018-05-22 13:00:00 2018-05-23 04:00:00  8   
10: 2018-05-27 13:00:00 2018-05-27 21:00:00  9   
11: 2018-06-01 13:00:00 2018-06-01 14:00:00    12
12: 2018-06-01 14:00:00 2018-06-01 23:00:00 10 12
13: 2018-06-01 23:00:00 2018-06-02 04:00:00    12

Now, it is better visible which anchorages were occupied in parallel during ships' visits.

Upvotes: 1

Related Questions