Reputation: 185
I want to create a column col2
which lists the distance in rows of each element in col1
to the next value in col1
which is equal to or lower.
Where there are no more elements which are equal or lower, it give the distance to the end of the colummn.
Example dataframe:
df1 <- data.frame(col1 = c(1,2,2,1,2,3,2))
col1
1 1
2 2
3 2
4 1
5 2
6 3
7 2
Desired output in this example:
col1 col2
1 1 3
2 2 1
3 2 1
4 1 4
5 2 3
6 3 1
7 2 1
My attempt so far is as per below. While this seems to work when I try it with vectors, it isn't working in the function.
This is to input into a broader function which I'm trying to write.
df1 |>
mutate(
col2 = sapply(row_number(), \(x) {
distance <- match(TRUE, col1[x] <= col1[(x+1):n()])
distance
})
)
Upvotes: 0
Views: 34
Reputation: 66880
My approach is to first add row numbers as an explicit row, since you're calculating on those. Then I do a rolling self-join, finding the next row that is smaller or equal to the current col1
. Then I take the first NA between either the row difference or the number of remaining rows.
library(dplyr)
df2 <- df1 |> mutate(row = row_number())
df2 |>
left_join(df2, join_by(closest(row < row), col1 >= col1)) |>
transmute(col1 = col1.x, col2 = coalesce(row.y - row.x, n() - row.x + 1))
Result
col1 col2
1 1 3
2 2 1
3 2 1
4 1 4
5 2 2
6 3 1
7 2 1
Upvotes: 2
Reputation: 4182
The main problem with the current approach is handling cases where there are no more lower/equal values - in those cases, we want the distance to the end of the dataframe. Your first 4 should be a 3, since it takes 3 rows to be at 1 again which is equal or lower.
df1 |>
mutate(
col2 = sapply(seq_len(n()), \(x) {
matches <- which(col1[-(1:x)] <= col1[x])
if (length(matches)) matches[1] else n() - x + 1
})
)
col1 | col2 |
---|---|
1 | 3 |
2 | 1 |
2 | 1 |
1 | 4 |
2 | 2 |
3 | 1 |
2 | 1 |
Upvotes: 1