Reputation: 1
im really new at R and i have a problem to solve. I have a dataframe like this
str(data)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 70128 obs. of 2 variables:
$ date: POSIXct, format: "2009-01-01 00:00:00" "2009-01-01 01:00:00" "2009-01-01 02:00:00" "2009-01-01 03:00:00" ...
$ value: num -0.6 -0.7 -0.6 -0.4 -0.4 -0.3 -0.3 -0.3 -0.1 0 ...
So i have my Date column, which is in POSIXct format and in 1 hour steps. My value column is numeric and represents a temperature.
Now i want to delete entire days by a condition. The condition is that if only one cell within a day is lower than 3 (°C) i want to delete the day.
I searched for a while, but i cant solve it. So hopefully you can help me.
Thank you in advance
Upvotes: 0
Views: 173
Reputation: 5017
Try to adapt this code:
A toy dataframe (2009-01-01 has only 1 hour with a value<3, and 2009-01-02 none):
df<-data.frame(date=c("2009-01-01 00:00:00", "2009-01-01 01:00:00", "2009-01-01 02:00:00", "2009-01-02 03:00:00"),
+ value=c(-0.6, 8, 4, 7))
df
date value
1 2009-01-01 00:00:00 -0.6
2 2009-01-01 01:00:00 8.0
3 2009-01-01 02:00:00 4.0
4 2009-01-02 03:00:00 7.0
Identify days to be deleted
date_to_delete<-unique(as.Date(df[df[,"value"]<3,"date"], format="%Y-%m-%d"))
Your dataframe cleaned
df[!(as.Date(df$date,format="%Y-%m-%d") %in% date_to_delete),]
date value
4 2009-01-02 03:00:00 7
Upvotes: 0
Reputation: 171
Using Pasqui's example before it was edited and slightly modifying it...
I chose to build the logic around my interpretation that a day can only be deleted if and only if ONLY ONE cell/record is below 3ºC in a day. So if there are two, three, or more cell/records with below 3ºC in a day, it is preserved. In this example, only January 4 2009 among all days has just exactly ONE cell/record below 3ºC, so it was removed.
library(dplyr)
#Building an example data frame
df <- data.frame(
date = as.POSIXct(c("2009-01-01 00:00:00", "2009-01-01 01:00:00",
"2009-01-01 02:00:00", "2009-01-01 03:00:00",
"2009-01-01 04:00:00", "2009-01-01 05:00:00",
"2009-01-02 02:00:00", "2009-01-02 03:00:00",
"2009-01-03 04:00:00", "2009-01-03 02:00:00",
"2009-01-03 03:00:00", "2009-01-03 04:00:00",
"2009-01-04 00:00:00", "2009-01-04 01:00:00")),
temp = c(1, -0.7, -0.6,
-0.4, 3.5, 2.9, -0.4, -0.3,
-0.3, 10, 4,
0, 3.3, 2.5)
)
require(lubridate)
df2 <- df %>%
mutate(
day = date(date),
counter = 1
) %>%
group_by(day) %>%
filter(
if (sum(counter[temp < 3]) == 1) {
FALSE
} else {
TRUE
}
)
# A tibble: 12 x 4
# Groups: day [3]
date temp day counter
<dttm> <dbl> <date> <dbl>
1 2009-01-01 00:00:00 1.0 2009-01-01 1
2 2009-01-01 01:00:00 -0.7 2009-01-01 1
3 2009-01-01 02:00:00 -0.6 2009-01-01 1
4 2009-01-01 03:00:00 -0.4 2009-01-01 1
5 2009-01-01 04:00:00 3.5 2009-01-01 1
6 2009-01-01 05:00:00 2.9 2009-01-01 1
7 2009-01-02 02:00:00 -0.4 2009-01-02 1
8 2009-01-02 03:00:00 -0.3 2009-01-02 1
9 2009-01-03 04:00:00 -0.3 2009-01-03 1
10 2009-01-03 02:00:00 10.0 2009-01-03 1
11 2009-01-03 03:00:00 4.0 2009-01-03 1
12 2009-01-03 04:00:00 0.0 2009-01-03 1
Upvotes: 1
Reputation: 621
Compact dplyr
syntax
library(dplyr)
#Building an example data frame
df <- data.frame(
datetime = as.POSIXct(c("2009-01-01 00:00:00", "2009-01-01 01:00:00",
"2009-01-01 02:00:00", "2009-01-01 03:00:00",
"2009-01-02 02:00:00", "2009-01-02 03:00:00",
"2009-01-03 04:00:00", "2009-01-03 02:00:00",
"2009-01-03 03:00:00", "2009-01-03 04:00:00",
"2009-01-04 03:00:00", "2009-01-04 04:00:00")),
temp = c(1, -0.7, -0.6,
-0.4, -0.4, -0.3,
-0.3, 10, 4,
0, 10, 5))
#Query
df %>%
mutate(date = lubridate::as_date(datetime)) %>%
group_by(date) %>%
filter(all(temp > 3))
#Result
datetime temp date
<dttm> <dbl> <date>
1 2009-01-04 03:00:00 10. 2009-01-04
2 2009-01-04 04:00:00 5. 2009-01-04
Upvotes: 2