aqkhan
aqkhan

Reputation: 86

Removing the max 3 dates rows R

I would like to know how to remove the rows with the latest 3 max dates. I get my data through an API so it's dynamic and can't just filter rows from my data frame with specific dates because the latest dates keep changes.

My Data looks like this:

 date       ptppAll mAverage
   <date>       <dbl>    <dbl>
 1 2020-03-01    1.71    NA   
 2 2020-03-02    7.82    NA   
 3 2020-03-03    9.81    NA   
 4 2020-03-04    1.71     4.23
 5 2020-03-05    3.42     4.72
 6 2020-03-06    0        4.68
 7 2020-03-07    5.13     6.19
 8 2020-03-08    5.13     6.53
 9 2020-03-09    7.54     6.53
10 2020-03-10   20.4      8.04

In the above data, assuming that 2020-03-10 is the max date, I'd like to remove it alongside rows containing dates 2020-03-09, 2020-03-08. So from my example, the code would like this:

   date       ptppAll mAverage
   <date>       <dbl>    <dbl>
 1 2020-03-01    1.71    NA   
 2 2020-03-02    7.82    NA   
 3 2020-03-03    9.81    NA   
 4 2020-03-04    1.71     4.23
 5 2020-03-05    3.42     4.72
 6 2020-03-06    0        4.68
 7 2020-03-07    5.13     6.19

Upvotes: 0

Views: 83

Answers (3)

akrun
akrun

Reputation: 886948

Using base R

head(data[order(data$date),], -3)

data

data <- structure(list(date = c("2020-03-01", "2020-03-02", "2020-03-03", 
                            "2020-03-04", "2020-03-05", "2020-03-06", "2020-    03-07", "2020-03-08", 
                            "2020-03-09", "2020-03-10"), 
                   ptppAll = c(1.71, 7.82, 9.81, 1.71, 3.42, 0, 5.13, 5.13, 7.54, 20.4),
                   mAverage = c(NA, NA, NA, 4.23, 4.72, 4.68, 6.19, 6.53, 6.53, 8.04)), class = "data.frame",
              row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))

Upvotes: 0

r.user.05apr
r.user.05apr

Reputation: 5456

You simply can use filter:

library(tidyverse)

data <- structure(list(date = c("2020-03-01", "2020-03-02", "2020-03-03", 
                            "2020-03-04", "2020-03-05", "2020-03-06", "2020-    03-07", "2020-03-08", 
                            "2020-03-09", "2020-03-10"), 
                   ptppAll = c(1.71, 7.82, 9.81, 1.71, 3.42, 0, 5.13, 5.13, 7.54, 20.4),
                   mAverage = c(NA, NA, NA, 4.23, 4.72, 4.68, 6.19, 6.53, 6.53, 8.04)), class = "data.frame",
              row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))


data %>%
  filter(date %in% head(sort(unique(.$date)), -3))

# date ptppAll mAverage
# 1 2020-03-01    1.71       NA
# 2 2020-03-02    7.82       NA
# 3 2020-03-03    9.81       NA
# 4 2020-03-04    1.71     4.23
# 5 2020-03-05    3.42     4.72
# 6 2020-03-06    0.00     4.68
# 7 2020-03-07    5.13     6.19

Upvotes: 0

Waldi
Waldi

Reputation: 41210

try:

library(dplyr)

data <- structure(list(date = c("2020-03-01", "2020-03-02", "2020-03-03", 
                        "2020-03-04", "2020-03-05", "2020-03-06", "2020-03-07", "2020-03-08", 
                        "2020-03-09", "2020-03-10"), 
                       ptppAll = c(1.71, 7.82, 9.81, 1.71, 3.42, 0, 5.13, 5.13, 7.54, 20.4),
                       mAverage = c(NA, NA, NA, 4.23, 4.72, 4.68, 6.19, 6.53, 6.53, 8.04)), class = "data.frame",
                       row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))

data %>% arrange(date) %>% head(-3)
#>         date ptppAll mAverage
#> 1 2020-03-01    1.71       NA
#> 2 2020-03-02    7.82       NA
#> 3 2020-03-03    9.81       NA
#> 4 2020-03-04    1.71     4.23
#> 5 2020-03-05    3.42     4.72
#> 6 2020-03-06    0.00     4.68
#> 7 2020-03-07    5.13     6.19

Created on 2020-08-20 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions