Ennyization
Ennyization

Reputation: 25

Summing a column to a certain value

I have a data.frame with 2 variables, and 177 observations. I would like to sum up one variable to a certain value, and then get the value of the other variable when that threshold is reached. I will try to add an reproducible example. I am new here so forgive me if I do it wrong.

> df <- data.frame(x=10:1,y=1:10)
> print(df)
    x  y
1  10  1
2   9  2
3   8  3
4   7  4
5   6  5
6   5  6
7   4  7
8   3  8
9   2  9
10  1 10

How can I sum column y until it reaches a certain value, let's say 7, and then either have it return the value of X(4), or the row number 7. I am sure it is pretty straightforward, but I seem to be drawing a blank.

Upvotes: 2

Views: 112

Answers (3)

s-v-r
s-v-r

Reputation: 166

If you want to stay with base R, try this

> df$x[df$y >= 7][1]
[1] 4
> max(cumsum(df$y[df$y <= 7]))
[1] 28

Or if you need this in a matrix form:

> cbind(df$x[df$y >= 7][1], max(cumsum(df$y[df$y <= 7])))
     [,1] [,2]
[1,]    4   28

I would still look into switching to data.table or at least dplyr packages for data manipulation.

Upvotes: 0

Nick Kennedy
Nick Kennedy

Reputation: 12640

The OP just asked for the relevant value of x which would be done using:

df$x[which(cumsum(df$y) >= 10)[1]]

Also note this finds the first where cumsum(df$y) is at least 10 whereas the other answers find the last <= 7 which is potentially different (though not for this dataset). For the original question (pre-comment) it would need to be:

df$x[which(cumsum(df$y) > 7)[1]]

Upvotes: 0

SabDeM
SabDeM

Reputation: 7200

Here is my solution.

df[cumsum(df$y) <= 7,]
   x y
1 10 1
2  9 2
3  8 3

Upvotes: 2

Related Questions