Sean
Sean

Reputation: 185

Create column giving next occurrence of equal or smaller value in other column

I want to create a column col2 which lists the distance in rows of each element in col1 to the next value in col1 which is equal to or lower.

Where there are no more elements which are equal or lower, it give the distance to the end of the colummn.

Example dataframe:

df1 <- data.frame(col1 = c(1,2,2,1,2,3,2))

  col1
1    1
2    2
3    2
4    1
5    2
6    3
7    2

Desired output in this example:

  col1 col2
1    1    3
2    2    1
3    2    1
4    1    4
5    2    3
6    3    1
7    2    1

My attempt so far is as per below. While this seems to work when I try it with vectors, it isn't working in the function.

This is to input into a broader function which I'm trying to write.

df1 |>
  mutate(
      col2 = sapply(row_number(), \(x) {
      distance <- match(TRUE, col1[x] <= col1[(x+1):n()]) 
      distance
    })
  )

Upvotes: 0

Views: 34

Answers (2)

Jon Spring
Jon Spring

Reputation: 66880

My approach is to first add row numbers as an explicit row, since you're calculating on those. Then I do a rolling self-join, finding the next row that is smaller or equal to the current col1. Then I take the first NA between either the row difference or the number of remaining rows.

library(dplyr)
df2 <- df1 |> mutate(row = row_number())
df2 |>
  left_join(df2, join_by(closest(row < row), col1 >= col1)) |>
  transmute(col1 = col1.x, col2 = coalesce(row.y - row.x, n() - row.x + 1))

Result

  col1 col2
1    1    3
2    2    1
3    2    1
4    1    4
5    2    2
6    3    1
7    2    1

Upvotes: 2

Tim G
Tim G

Reputation: 4182

The main problem with the current approach is handling cases where there are no more lower/equal values - in those cases, we want the distance to the end of the dataframe. Your first 4 should be a 3, since it takes 3 rows to be at 1 again which is equal or lower.

df1 |>
  mutate(
    col2 = sapply(seq_len(n()), \(x) {
      matches <- which(col1[-(1:x)] <= col1[x])
      if (length(matches)) matches[1] else n() - x + 1
    })
  )
col1 col2
1 3
2 1
2 1
1 4
2 2
3 1
2 1

Upvotes: 1

Related Questions