denis
denis

Reputation: 802

Repeat a sequential numbering of duplicated values starting from a given value in R

In a new variable row2, how to repeat a sequential numbering (here a sequence from 3 to 6) by group of duplicated row1 values, which would start from a given value (here from row1 = 3), even if the last sequence is incomplete (here 3 to 5 for example)?

Thanks for help

Desired output:

> dat1
   row1 row2
1     1    1
2     1    1
3     2    2
4     3    3 # start the sequence
5     4    4
6     4    4
7     4    4
8     5    5
9     5    5
10    6    6
11    6    6
12    6    6
13    7    3 # repeat the sequence
14    7    3
15    8    4
16    8    4
17    9    5
18    9    5
19    9    5
20   10    6
21   11    3 # and repeat again...
22   11    3
23   11    3
24   12    4
25   13    5
26   13    5 # ...even if incomplete 

Initial data:

row1 <- c(1,1,2,
          3,4,4,4,5,5,6,6,6,
          7,7,8,8,9,9,9,10,
          11,11,11,12,13,13)
dat1 <- data.frame(row1)

Upvotes: 1

Views: 95

Answers (3)

Friede
Friede

Reputation: 7979

You might want to write a more concise version from

dat1 |>
  transform(row2 = {
    i = row1 < 3
    c(row1[i], with(rle(row1[!i]), rep(rep(3:6, length.out=length(lengths)), lengths)))
})
   row1 row2
1     1    1
2     1    1
3     2    2
4     3    3
5     4    4
6     4    4
7     4    4
8     5    5
9     5    5
10    6    6
11    6    6
12    6    6
13    7    3
14    7    3
15    8    4
16    8    4
17    9    5
18    9    5
19    9    5
20   10    6
21   11    3
22   11    3
23   11    3
24   12    4
25   13    5
26   13    5

If yoou like to apply to your data from previous question, we can wrap operations depending on group pdf in a single tapply()- or by()-call, e.g.

tapply(dat0, ~pdf, \(x) {
  x$row1 = with(rle(x$row0), rep(seq_along(values), lengths))
  x$row2 = c(x$row1[x$row1 < 3], with(rle(x$row1[!x$row1 < 3]), rep(rep(3:6, length.out=length(lengths)), lengths)))
  x
  }) |> do.call(what='rbind') |> `row.names<-`(NULL) # cosmetics

if this

   pdf page row0 row1 row2
1    x    3    5    1    1
2    x    3    5    1    1
3    x    3    5    1    1
4    x    3    5    1    1
5    x    3    6    2    2
6    x    3    6    2    2
7    x    3    6    2    2
8    x    3    7    3    3
9    x    3    7    3    3
10   x    4    1    4    4
11   x    4    1    4    4
12   x    4    1    4    4
13   x    4    2    5    5
14   x    4    2    5    5
15   x    4    2    5    5
16   x    4    2    5    5
17   x    4    3    6    6
18   y    6    2    1    1
19   y    6    2    1    1
20   y    6    3    2    2
21   y    6    3    2    2
22   y    6    3    2    2
23   y    6    4    3    3
24   y    6    4    3    3
25   y    7    1    4    4
26   y    7    1    4    4
27   y    7    1    4    4
28   y    7    1    4    4
29   y    7    2    5    5
30   y    7    2    5    5
31   y    7    2    5    5
32   y    7    3    6    6
33   y    8    1    7    3
34   y    8    1    7    3
35   y    8    2    8    4

is desired result. Have a look on rows 32-35. (Might be better to re-name row0-3 to col0-3.)

The first anonymous function is very useful. We can wrap it in a custom function:

consecutive_id = \(x) with(rle(x), rep(seq_along(values), lengths))

Upvotes: 1

Tim G
Tim G

Reputation: 4147

You could use if_else to apply modulo to val >=3

  1. (row1 - 3) %% 4 cycles through 1,2,3, effectively mapping row1 values into 3,4,5,6 repeatedly.

  2. +3 shifts the sequence to start at 3.

  3. Values of row1 < 3 are kept untouched


dat1$row2 <- if_else(dat1$row1 >= 3, (dat1$row1 - 3) %% 4 + 3, dat1$row1)

   row1 row2
1     1    1
2     1    1
3     2    2
4     3    3
5     4    4
6     4    4
7     4    4
8     5    5
9     5    5
10    6    6
11    6    6
12    6    6
13    7    3
14    7    3
15    8    4
16    8    4
17    9    5
18    9    5
19    9    5
20   10    6
21   11    3
22   11    3
23   11    3
24   12    4
25   13    5
26   13    5

Upvotes: 2

TarJae
TarJae

Reputation: 79184

We can do like this:

  1. For each row, if the value in row1 is less than 3, then row2 is set equal to row1
  2. For rows where row1 is 3 or greater, we perform a calculation to assign a cyclic sequence.
  3. Extracting and Filtering Unique Values
  4. %>% .[. >= 3] filters the sorted vector to only include those values that are greater than or equal to 3
  5. Matching to Get the Group Position
  6. Zero-Indexing the Position: ( ... - 1)
  7. Cycling the Sequence with the modulo operator: %% 4
  8. Finally shifting to the Desired Start Value: + 3
library(dplyr)

dat1 %>%
  mutate(row2 = if_else(row1 < 3,
                        row1,  
                        (
                          (match(
                          row1, sort(unique(row1)) %>% 
                                  .[. >= 3]) - 1) %% 4
                          ) + 3
                        )
         )
 row1 row2
1     1    1
2     1    1
3     2    2
4     3    3
5     4    4
6     4    4
7     4    4
8     5    5
9     5    5
10    6    6
11    6    6
12    6    6
13    7    3
14    7    3
15    8    4
16    8    4
17    9    5
18    9    5
19    9    5
20   10    6
21   11    3
22   11    3
23   11    3
24   12    4
25   13    5
26   13    5

Upvotes: 2

Related Questions