subset a dataframe based on sum of a column

Question

I have a df that looks like this:

> df2
  name      value
1    a 0.20019421
2    b 0.17996454
3    c 0.14257010
4    d 0.14257010 
5    e 0.11258865
6    f 0.07228970
7    g 0.05673759
8    h 0.05319149
9    i 0.03989362

I would like to subset it using the sum of the column value, i.e, I want to extract those rows which sum of values from column value is higher than 0.6, but starting to sum values from the first row. My desired output will be:

> df2
  name      value
1    a 0.20019421
2    b 0.17996454
3    c 0.14257010
4    d 0.14257010

I have tried df2[, colSums[,5]>=0.6] but obviously colSums is expecting an array

Thanks in advance

Sven Hohenstein · Accepted Answer

Here's an approach:

 df2[seq(which(cumsum(df2$value) >= 0.6)[1]), ]

The result:

  name     value
1    a 0.2001942
2    b 0.1799645
3    c 0.1425701
4    d 0.1425701

subset a dataframe based on sum of a column

Answers (2)

Related Questions