Reputation: 25
I have a data.frame with 2 variables, and 177 observations. I would like to sum up one variable to a certain value, and then get the value of the other variable when that threshold is reached. I will try to add an reproducible example. I am new here so forgive me if I do it wrong.
> df <- data.frame(x=10:1,y=1:10)
> print(df)
x y
1 10 1
2 9 2
3 8 3
4 7 4
5 6 5
6 5 6
7 4 7
8 3 8
9 2 9
10 1 10
How can I sum column y
until it reaches a certain value, let's say 7
, and then either have it return the value of X(4)
, or the row number 7
. I am sure it is pretty straightforward, but I seem to be drawing a blank.
Upvotes: 2
Views: 112
Reputation: 166
If you want to stay with base R, try this
> df$x[df$y >= 7][1]
[1] 4
> max(cumsum(df$y[df$y <= 7]))
[1] 28
Or if you need this in a matrix form:
> cbind(df$x[df$y >= 7][1], max(cumsum(df$y[df$y <= 7])))
[,1] [,2]
[1,] 4 28
I would still look into switching to data.table
or at least dplyr
packages for data manipulation.
Upvotes: 0
Reputation: 12640
The OP just asked for the relevant value of x which would be done using:
df$x[which(cumsum(df$y) >= 10)[1]]
Also note this finds the first where cumsum(df$y) is at least 10 whereas the other answers find the last <= 7 which is potentially different (though not for this dataset). For the original question (pre-comment) it would need to be:
df$x[which(cumsum(df$y) > 7)[1]]
Upvotes: 0