FernRay
FernRay

Reputation: 77

How to pick hourly values from dataset?

I need help with this issue:

I have a dataset of water level values distributed every 30 minutes, but I need only the hourly values. I tried with the aggregate() function but due to function FUN is one requisite it determines my analysis to be mean, or median and I don't want to use any stat function.

This one example of my data frame

06/16/2015 02:00:00 0.036068
06/16/2015 02:30:00 0.008916
06/16/2015 03:00:00 -0.008622
06/16/2015 03:30:00 -0.014057
06/16/2015 04:00:00 -0.011172
06/16/2015 04:30:00 0.002401
06/16/2015 05:00:00 0.029632
06/16/2015 05:30:00 0.061902002
06/16/2015 06:00:00 0.087366998
06/16/2015 06:30:00 0.105176002
06/16/2015 07:00:00 0.1153
06/16/2015 07:30:00 0.126197994
06/16/2015 08:00:00 0.144154996

Upvotes: 6

Views: 457

Answers (2)

akrun
akrun

Reputation: 886968

We convert the 'RefDateTimeRef' column to POSIXct, extract the 'minute', 'second' with format and compare it with 00:00 to return a logical vector which we use to subset the rows.

df1[format(as.POSIXct(df1[,1], format = "%m/%d/%Y %H:%M"), "%M:%S")=="00:00",]
#     RefDateTimeRef  Data
#10 04/14/2016 09:00 0.153
#22 04/14/2016 08:00 0.148

Or with lubridate

library(lubridate)
df1[ minute(mdy_hm(df1[,1]))==0,]
#     RefDateTimeRef  Data
#10 04/14/2016 09:00 0.153
#22 04/14/2016 08:00 0.148

Or with sub to remove the substring until the hour part and then use == to get the logical vector and subset the rows.

df1[ sub(".*\\s+\\S{2}:", "", df1[,1])=="00",]

NOTE: I would advice against using sub or substr as it can sometimes lead to incorrect answers.

Upvotes: 3

Kunal Puri
Kunal Puri

Reputation: 3427

df <- read.table(text = '06/16/2015 02:00:00 0.036068
06/16/2015 02:30:00 0.008916
06/16/2015 03:00:00 -0.008622
06/16/2015 03:30:00 -0.014057
06/16/2015 04:00:00 -0.011172
06/16/2015 04:30:00 0.002401
06/16/2015 05:00:00 0.029632
06/16/2015 05:30:00 0.061902002
06/16/2015 06:00:00 0.087366998
06/16/2015 06:30:00 0.105176002
06/16/2015 07:00:00 0.1153
06/16/2015 07:30:00 0.126197994
06/16/2015 08:00:00 0.144154996')

colnames(df) <- c('Date','Time','Value')

index <- ifelse(substring(df$Time,4) == "00:00",T,F)

final_df <- df[index,]

Upvotes: 3

Related Questions