Rearrangment of time series data

Question

I am not good at "R" and not sure how to rearrange and subset time series data. Sorry, if this question sounds stupid. I have a time series data of sea tide with four values per day (with missing values as well). Two values for high tide and two values for low tide. The time and date are given in the same column but in different rows. Now, I want to subset the data only for daytime (from 7:00 AM to 7:00 PM) not for night. Then I want to have data arranged in three columns only i) Date, ii) Time and iii) Tide. For Tide, I only need minimum and maximum values. Here is an example of the data and the desired arrangement of data. For each date, data is arranged in three rows similar to the example.

1/1/2011    Low High    Low NA
Time    2:58 AM 9:38 AM 5:19 PM NA
Tide    1.2 m   2.2 m   0.6 m   NA
1/2/2011    High    Low High    Low
Time    2:07 AM 4:22 AM 10:19 AM    6:07 PM
Tide    1.4 m   1.3 m   2.3 m   0.4 m

Date    Time    Tide
1/1/2011    17:19   0.6
1/1/2011    9:38    2.2
1/2/2011    2:07    1.4
1/2/2011    18:07   0.4

G. Grothendieck · Accepted Answer

The input, DF is assumed to be as in the Note below.

g, the grouping vector, has one element per row of DF and is equal to c(1, 1, 1, 2, 2, 2, ...). Alternate ways to compute g would be n <- nrow(DF); g <- gl(n, 3, n) or n <- nrow(DF); g <- rep(1:3, n, n).

We then use by to split DF into groups and apply the indicated anonymous function to each group as defined by g.

The anonymous function combines the date and the times in the current group to create the date/times dt making use of the fact that the common date is x[1,1] and the times prior to being cleaned up are in x[2,-1]. Using dt and the tides in x[2, -1] (prior to being cleaned up) it computes each of the three columns arranging them into a data frame. Then there is a commented out line which removes NA values. If you want this uncomment it. Subset the data frame obtained so far to the 7am to 7pm time period and further take the two rows consisting of the min and max tide. We sort that by time.

Finally do.call("rbind", ...) puts the groups together into one overall data frame.

No packages are used.

g <- cumsum(grepl("\d", DF$V1))
Long <- do.call("rbind", by(DF, g, function(x) {
  dt <- as.POSIXct(paste(x[1,1], as.matrix(x[2, -1])), format = "%m/%d/%Y %I:%M %p")
  X <- data.frame(Date = as.Date(dt),
                  Time = format(dt, "%H:%M"),
                  Tide = as.numeric(sub("m", "", as.matrix(x[3, -1]))),
                  stringsAsFactors = FALSE)
  # X <- na.omit(X)
  X <- subset(X, Time >= "07:00" & Time <= "19:00")
  X <- X[c(which.min(X$Tide), which.max(X$Tide)), ]
  X[order(X$Time), ]
}))

giving the following -- note that the third row in the question's output is not between 7am and 7pm so the output here necessarily differs.

> Long
          Date  Time Tide
1.2 2011-01-01 09:38  2.2
1.3 2011-01-01 17:19  0.6
2.3 2011-01-02 10:19  2.3
2.4 2011-01-02 18:07  0.4

Note: The input DF is assumed to be as follows in reproducible form:

Lines <- "1/1/2011,Low,High,Low,NA
Time,2:58 AM,9:38 AM,5:19 PM,NA
Tide,1.2 m,2.2 m,0.6 m,NA
1/2/2011,High,Low,High,Low
Time,2:07 AM,4:22 AM,10:19 AM,6:07 PM
Tide,1.4 m,1.3 m,2.3 m,0.4 m"
DF <- read.table(text = Lines, sep = ",", as.is = TRUE)

Rearrangment of time series data

Answers (2)

Related Questions