Reputation: 39

count number of rows needed to have sum greater than a particular value in R

I want to subset dataframe such that no of rows needed to get mpg value is at least 100.

library(datasets)

data(mtcars)
head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

The output should be top 5 values here mpg sum is >100 after Hornet Sportabout

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

I want to the checksum at each row for the mpg column and then the output as no of rows it took to get that sum of at least 100

Upvotes: 1

Answers (5)

eipi10

Reputation: 93851

You can use a filter condition with dplyr:

library(tidyverse)

mtcars %>% 
  filter(row_number() %in% 1:(max(which(cumsum(mpg) < 100)) + 1))

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

The code can be shortened with slice:

mtcars %>% 
  slice(1:(max(which(cumsum(mpg) < 100)) + 1))

And packaged as a function:

fnc = function(data, var, cutoff) {
  data %>% 
    slice(1:(max(which(cumsum({{var}}) < cutoff)) + 1))
}

fnc(mtcars, mpg, 100)
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

fnc(iris, Sepal.Width, 10)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa

Upvotes: 0

jmpivette

Reputation: 275

I would use cumsum in association with lag

library(dplyr)
    
mtcars %>% 
      filter(cumsum(lag(mpg, default = 0)) < 100)

Upvotes: 1

LMc

Reputation: 18662

You can also use the purrr library:

library(purrr)
which.max(purrr::accumulate(mtcars$mpg, `+`) > 100)

# 5

If you want the whole dataset you can use dplyr::slice:

library(tidyverse)

dplyr::slice(mtcars, 1 : which.max(purrr::accumulate(mtcars$mpg, `+`) > 100))

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

Upvotes: 0

ThomasIsCoding

Reputation: 102211

A base R option using subset + cumsum

subset(mtcars, c(TRUE, cumsum(mpg) <= 100)[-nrow(mtcars)])

gives

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

Upvotes: 0

Bruno

Reputation: 4150

This should solve it

library(tidyverse)


df_answer <- mtcars %>% 
  rownames_to_column() %>% 
  tibble() %>% 
  mutate(cum_sum = cumsum(mpg)) %>% 
  filter(cum_sum < 100)

df_answer %>% 
  nrow() + 1

Upvotes: 0

count number of rows needed to have sum greater than a particular value in R

Answers (5)

Related Questions