H.Maxi
H.Maxi

Reputation: 1

R- Deleting rows by condition of one cell

im really new at R and i have a problem to solve. I have a dataframe like this

str(data)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   70128 obs. of  2 variables:
 $ date: POSIXct, format: "2009-01-01 00:00:00" "2009-01-01 01:00:00" "2009-01-01 02:00:00" "2009-01-01 03:00:00" ...
 $ value: num  -0.6 -0.7 -0.6 -0.4 -0.4 -0.3 -0.3 -0.3 -0.1 0 ...

So i have my Date column, which is in POSIXct format and in 1 hour steps. My value column is numeric and represents a temperature.

Now i want to delete entire days by a condition. The condition is that if only one cell within a day is lower than 3 (°C) i want to delete the day.

I searched for a while, but i cant solve it. So hopefully you can help me.

Thank you in advance

Upvotes: 0

Views: 173

Answers (3)

Terru_theTerror
Terru_theTerror

Reputation: 5017

Try to adapt this code:

A toy dataframe (2009-01-01 has only 1 hour with a value<3, and 2009-01-02 none):

    df<-data.frame(date=c("2009-01-01 00:00:00", "2009-01-01 01:00:00", "2009-01-01 02:00:00", "2009-01-02 03:00:00"),
+ value=c(-0.6, 8, 4, 7))
df
                 date value
1 2009-01-01 00:00:00  -0.6
2 2009-01-01 01:00:00   8.0
3 2009-01-01 02:00:00   4.0
4 2009-01-02 03:00:00   7.0

Identify days to be deleted

date_to_delete<-unique(as.Date(df[df[,"value"]<3,"date"], format="%Y-%m-%d"))

Your dataframe cleaned

df[!(as.Date(df$date,format="%Y-%m-%d") %in% date_to_delete),]
                 date value
4 2009-01-02 03:00:00     7

Upvotes: 0

Anonymous
Anonymous

Reputation: 171

Using Pasqui's example before it was edited and slightly modifying it...

I chose to build the logic around my interpretation that a day can only be deleted if and only if ONLY ONE cell/record is below 3ºC in a day. So if there are two, three, or more cell/records with below 3ºC in a day, it is preserved. In this example, only January 4 2009 among all days has just exactly ONE cell/record below 3ºC, so it was removed.

library(dplyr)

#Building an example data frame
df <- data.frame(
  date = as.POSIXct(c("2009-01-01 00:00:00", "2009-01-01 01:00:00", 
                      "2009-01-01 02:00:00", "2009-01-01 03:00:00",
                      "2009-01-01 04:00:00", "2009-01-01 05:00:00",
                      "2009-01-02 02:00:00", "2009-01-02 03:00:00", 
                      "2009-01-03 04:00:00", "2009-01-03 02:00:00", 
                      "2009-01-03 03:00:00", "2009-01-03 04:00:00",
                      "2009-01-04 00:00:00", "2009-01-04 01:00:00")),

  temp = c(1, -0.7, -0.6,
           -0.4, 3.5, 2.9, -0.4, -0.3, 
           -0.3, 10, 4, 
           0, 3.3, 2.5)

)

require(lubridate)

df2 <- df %>% 
  mutate(
    day = date(date),
    counter = 1
  ) %>%
  group_by(day) %>%
  filter(
    if (sum(counter[temp < 3]) == 1) {
      FALSE
    } else {
      TRUE
    }
  )

# A tibble: 12 x 4
# Groups:   day [3]
                  date  temp        day counter
                <dttm> <dbl>     <date>   <dbl>
 1 2009-01-01 00:00:00   1.0 2009-01-01       1
 2 2009-01-01 01:00:00  -0.7 2009-01-01       1
 3 2009-01-01 02:00:00  -0.6 2009-01-01       1
 4 2009-01-01 03:00:00  -0.4 2009-01-01       1
 5 2009-01-01 04:00:00   3.5 2009-01-01       1
 6 2009-01-01 05:00:00   2.9 2009-01-01       1
 7 2009-01-02 02:00:00  -0.4 2009-01-02       1
 8 2009-01-02 03:00:00  -0.3 2009-01-02       1
 9 2009-01-03 04:00:00  -0.3 2009-01-03       1
10 2009-01-03 02:00:00  10.0 2009-01-03       1
11 2009-01-03 03:00:00   4.0 2009-01-03       1
12 2009-01-03 04:00:00   0.0 2009-01-03       1

Upvotes: 1

Pasqui
Pasqui

Reputation: 621

Compact dplyr syntax

library(dplyr)

#Building an example data frame
df <- data.frame(
datetime = as.POSIXct(c("2009-01-01 00:00:00", "2009-01-01 01:00:00", 
                    "2009-01-01 02:00:00", "2009-01-01 03:00:00",
                    "2009-01-02 02:00:00", "2009-01-02 03:00:00", 
                    "2009-01-03 04:00:00", "2009-01-03 02:00:00", 
                    "2009-01-03 03:00:00", "2009-01-03 04:00:00",
                    "2009-01-04 03:00:00", "2009-01-04 04:00:00")),

temp = c(1, -0.7, -0.6,
         -0.4, -0.4, -0.3, 
         -0.3, 10, 4, 
         0, 10, 5))

#Query
df %>% 
    mutate(date = lubridate::as_date(datetime)) %>% 
    group_by(date) %>% 
    filter(all(temp > 3))

#Result
      datetime             temp date      
  <dttm>              <dbl> <date>    
1 2009-01-04 03:00:00   10. 2009-01-04
2 2009-01-04 04:00:00    5. 2009-01-04

Upvotes: 2

Related Questions