Reputation: 117
I am not good at "R" and not sure how to rearrange and subset time series data. Sorry, if this question sounds stupid. I have a time series data of sea tide with four values per day (with missing values as well). Two values for high tide and two values for low tide. The time and date are given in the same column but in different rows. Now, I want to subset the data only for daytime (from 7:00 AM to 7:00 PM) not for night. Then I want to have data arranged in three columns only i) Date, ii) Time and iii) Tide. For Tide, I only need minimum and maximum values. Here is an example of the data and the desired arrangement of data. For each date, data is arranged in three rows similar to the example.
1/1/2011 Low High Low NA
Time 2:58 AM 9:38 AM 5:19 PM NA
Tide 1.2 m 2.2 m 0.6 m NA
1/2/2011 High Low High Low
Time 2:07 AM 4:22 AM 10:19 AM 6:07 PM
Tide 1.4 m 1.3 m 2.3 m 0.4 m
Date Time Tide
1/1/2011 17:19 0.6
1/1/2011 9:38 2.2
1/2/2011 2:07 1.4
1/2/2011 18:07 0.4
Upvotes: 1
Views: 124
Reputation: 269644
The input, DF
is assumed to be as in the Note below.
g
, the grouping vector, has one element per row of DF
and is equal to c(1, 1, 1, 2, 2, 2, ...)
. Alternate ways to compute g
would be n <- nrow(DF); g <- gl(n, 3, n)
or n <- nrow(DF); g <- rep(1:3, n, n)
.
We then use by
to split DF
into groups and apply the indicated anonymous function to each group as defined by g
.
The anonymous function combines the date and the times in the current group to create the date/times dt
making use of the fact that the common date is x[1,1]
and the times prior to being cleaned up are in x[2,-1]
.
Using dt
and the tides in x[2, -1]
(prior to being cleaned up) it computes each of the three columns arranging them into a data frame. Then there is a commented out line which removes NA values. If you want this uncomment it. Subset the data frame obtained so far to the 7am to 7pm time period and further take the two rows consisting of the min and max tide. We sort that by time.
Finally do.call("rbind", ...)
puts the groups together into one overall data frame.
No packages are used.
g <- cumsum(grepl("\\d", DF$V1))
Long <- do.call("rbind", by(DF, g, function(x) {
dt <- as.POSIXct(paste(x[1,1], as.matrix(x[2, -1])), format = "%m/%d/%Y %I:%M %p")
X <- data.frame(Date = as.Date(dt),
Time = format(dt, "%H:%M"),
Tide = as.numeric(sub("m", "", as.matrix(x[3, -1]))),
stringsAsFactors = FALSE)
# X <- na.omit(X)
X <- subset(X, Time >= "07:00" & Time <= "19:00")
X <- X[c(which.min(X$Tide), which.max(X$Tide)), ]
X[order(X$Time), ]
}))
giving the following -- note that the third row in the question's output is not between 7am and 7pm so the output here necessarily differs.
> Long
Date Time Tide
1.2 2011-01-01 09:38 2.2
1.3 2011-01-01 17:19 0.6
2.3 2011-01-02 10:19 2.3
2.4 2011-01-02 18:07 0.4
Note: The input DF
is assumed to be as follows in reproducible form:
Lines <- "1/1/2011,Low,High,Low,NA
Time,2:58 AM,9:38 AM,5:19 PM,NA
Tide,1.2 m,2.2 m,0.6 m,NA
1/2/2011,High,Low,High,Low
Time,2:07 AM,4:22 AM,10:19 AM,6:07 PM
Tide,1.4 m,1.3 m,2.3 m,0.4 m"
DF <- read.table(text = Lines, sep = ",", as.is = TRUE)
Upvotes: 1
Reputation: 5898
If the list is not too long, this endeavour would be simpler to do in a spreadsheet simply by mapping cells and filtering. But one way to do it in R with zoo and tidyverse is the following:
Assuming that the original dataframes have their columns named as C1:C5
C1 C2 C3 C4 C5
<chr> <chr> <chr> <chr> <chr>
1 1/1/2010 Low High Low <NA>
2 Time 2:58 AM 9:38 AM 5:19 PM <NA>
3 Tide 1.2 2.2 0.6 <NA>
4 1/2/2011 High Low High Low
5 Time 2:07 AM 4:22 AM 10:19 AM 6:07 PM
6 Tide 1.4 1.3 2.3 0.4
DF <- DF %>%
mutate(Date = as.Date(gsub("Tide|Time","", C1), format = "%d/%m/%Y"))
DF <- DF %>%
mutate(Date = na.locf(DF$Date, na.rm = TRUE),
C1 = gsub("[[:digit:]]|\\/", "", C1),
Type = if_else(nchar(C1) == 0, "TideType", C1)) %>%
select(Date, Type, C2:C5) %>%
gather(oColumn, Value, -c(Date, Type)) %>%
spread(key = Type, value = Value) %>%
select(Date, Time, Tide) %>%
filter(complete.cases(.))
DF <- DF %>%
mutate(Time = ymd_hm(paste(DF$Date, DF$Time, sep = " ")),
Tide = as.numeric(Tide))
DF <- DF %>%
mutate(DayNight = (DF$Time) %within%
interval(as.POSIXlt(DF$Date) + (7*60*60), as.POSIXlt(DF$Date) + (19*60*60))) %>%
filter(DayNight == TRUE) %>%
select(-DayNight) ) %>%
group_by(Date) %>%
filter(Tide == max(Tide) | min(Tide))
DF
Source: local data frame [4 x 3]
Groups: Date [2]
Date Time Tide
<date> <dttm> <dbl>
1 2010-01-01 2010-01-01 09:38:00 2.2
2 2010-01-01 2010-01-01 17:19:00 0.6
3 2011-02-01 2011-02-01 10:19:00 2.3
4 2011-02-01 2011-02-01 18:07:00 0.4
Note that "Date" is a Date type of Object and "Time" is a Posixct type of Date-Time Object. You might want to convert "Time" into a vector of minutes.
Upvotes: 0