skunkwerx
skunkwerx

Reputation: 11

Need to subset data frame by specific times using R

I have a data frame called summer.q.0 that contains time (X) and 4 temperatures (X0, X0.33, X0.66 and, X1). I need to subset the data frame extracting rows at specific times. Here is the head of my data frame:

                   X     X0  X0.33  X0.66     X1
1 6/28/2009 19:30:00 30.760 27.340 25.800 24.600
2 6/28/2009 19:32:24 30.740 27.352 25.804 24.599
3 6/28/2009 19:35:16 30.740 27.363 25.807 24.599
4 6/28/2009 19:38:44 30.740 27.372 25.808 24.599
5 6/28/2009 19:40:00 30.740 27.375 25.809 24.599
6 6/28/2009 19:42:24 30.703 27.379 25.810 24.599
7 6/28/2009 19:45:16 30.703 27.384 25.810 24.599
8 6/28/2009 19:48:44 30.703 27.390 25.811 24.599

In columnXthe time in minutes occur at random times. I need to extract the rows where the time in minutes in column Xis on the quarter hour (i.e. rows where minutes = 00, 30, or 45). In the data frame the values in the X column are factors. I realize that they need to be converted to a recognizable R time format for extracting the required rows but not sure of the best or easiest method. Then after converting X to the proper R date and time format, what is the easiest way to extract/subset the required rows?

For example: assuming my data frame was just the 8 rows above (the full data set has 33601 rows and is modeled temperature time-series data), I would would have a subset that would look like this and with a new time format as below:

                    X     X0  X0.33  X0.66     X1
1 2009-06-28 19:30:00 30.760 27.340 25.800 24.600
2 2009-06-28 19:45:16 30.703 27.384 25.810 24.599

In the subset, I need to subset all the rows where minutes are exactly 00, 33 and 45 (15 minute data) so that I can compare it to a measured data set that is already in 15 minute intervals and with time in POSIXlt format. So, the subset would contain rows 1 and 7 from the original data frame of 8 rows. Ideally I would set the seconds in the subset ==00; not sure if this can be accomplished easily. Reasoning is that the modeled and measured data will not match exactly so seconds is not that important when comparing plots of both data sets. Hope this makes sense. Thanks!

Upvotes: 1

Views: 654

Answers (3)

Olivier Grimard
Olivier Grimard

Reputation: 77

I managed to do something similar.

Your col. X as to be in the right format. In your case, using:

df%>%
mutate(X = mdy_hms(df$X, format = "%m/%d/%Y %H:%M:%S"), .keep = "unused")

Then using lubridate and dplyr packages.

df%>%
filter(minute(X) %in% c(0,30,45))

Upvotes: 0

Parfait
Parfait

Reputation: 107642

Consider the base R solution:

# CONVERT COLUMN TO DATE (POSIXlt)
df$X <- strptime(df$X, "%m/%d/%Y %H:%M:%S")

# SUBSET BY NUMERIC MINUTE
df <- (df[as.numeric(format(df$X, "%M")==0)
        | as.numeric(format(df$X, "%M")==15)
        | as.numeric(format(df$X, "%M")==30)
        | as.numeric(format(df$X, "%M")==45),])

# LESS VERBOSE SUBSET
df <- subset(df, format(df$X, "%M") %in% c('0', '15', '30', '45'))

Upvotes: 0

Pash101
Pash101

Reputation: 641

If you provide a slightly fuller example we can help more, but this should get you started.

library(lubridate)

#this will return just the minutes from a date (I've added a random example to demonstrate this)
your_data$X <- format(mdy_hms(df$X, format="%M")
example <- format(mdy_hms("6/28/2009 19:30:00"), format="%M")

#you can then subset your data frame using something like this
subset_data <- subset(your_data,X == "0" | X=="30"| X=="45")

Upvotes: 0

Related Questions