Reputation: 1469
This title is really horrible, but I just cant find a solution to this. I have a dataframe that looks this this:
structure(list(PIFF_ID = c(371700, 371700, 371700, 371700, 371700,
371700, 371700, 371700, 371700, 371700, 371700, 371700, 371700,
371700, 371700, 371700, 371700, 371700, 371700, 371700, 450200,
450200, 450200, 450200, 450200, 450200, 450200, 450200, 450200,
450200, 450200, 450200, 450200, 450200, 450200, 450200, 450200,
450200, 450200, 450200, 1600200, 1600200, 1600200, 1600200, 1600200,
1600200, 1600200, 1600200, 1600200, 1600200, 1600200, 1600200,
1600200, 1600200, 1600200, 1600200, 1600200, 1600200, 1600200,
1600200, 2060200, 2060200, 2060200, 2060200, 2060200, 2060200,
2060200, 2060200, 2060200, 2060200, 2060200, 2060200, 2060200,
2060200, 2060200, 2060200, 2060200, 2060200, 2060200, 2060200
), days_before_event = c(180L, 179L, 178L, 177L, 176L, 175L,
174L, 173L, 172L, 171L, 170L, 169L, 168L, 167L, 166L, 165L, 164L,
163L, 162L, 161L, 180L, 179L, 178L, 177L, 176L, 175L, 174L, 173L,
172L, 171L, 170L, 169L, 168L, 167L, 166L, 165L, 164L, 163L, 162L,
161L, 180L, 179L, 178L, 177L, 176L, 175L, 174L, 173L, 172L, 171L,
170L, 169L, 168L, 167L, 166L, 165L, 164L, 163L, 162L, 161L, 180L,
179L, 178L, 177L, 176L, 175L, 174L, 173L, 172L, 171L, 170L, 169L,
168L, 167L, 166L, 165L, 164L, 163L, 162L, 161L)), row.names = c(NA,
-80L), groups = structure(list(PIFF_ID = c(371700, 450200, 1600200,
2060200), .rows = structure(list(1:20, 21:40, 41:60, 61:80), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
What I want to do, and I cannot figure out how, is the following.
For each group of PIFF_ID
I want to add a column that joins together 18-consequent rows and gives them an id. This is supposed to start (having the lowest ID for each PIFF_ID
) at the lowest value of days_before_event
.
So the result should look more or less like this (which I created manually):
df_manual = data.frame(
PIFF_ID = rep(c("a", "b"), each = 36),
days_before_event = rep(36:1, 2),
window_18_days = rep(2:1, each = 18)
)
PIFF_ID days_before_event window_18_days
1 a 36 2
2 a 35 2
3 a 34 2
4 a 33 2
5 a 32 2
6 a 31 2
7 a 30 2
8 a 29 2
9 a 28 2
10 a 27 2
11 a 26 2
12 a 25 2
13 a 24 2
14 a 23 2
15 a 22 2
16 a 21 2
17 a 20 2
18 a 19 2
19 a 18 1
20 a 17 1
21 a 16 1
22 a 15 1
23 a 14 1
24 a 13 1
25 a 12 1
26 a 11 1
27 a 10 1
28 a 9 1
29 a 8 1
30 a 7 1
31 a 6 1
32 a 5 1
33 a 4 1
34 a 3 1
35 a 2 1
36 a 1 1
In my real dataset the row-numbers are also not always a divisible without remainder by 18. So the last rows could end with a window_18_days
-value of anything between 1 and 18.
Upvotes: 0
Views: 40
Reputation: 186
This is one possible solution using only base R
Basically I split the data frame by PIFF_ID and then see in how many 18 line chunks I can split the data. Then I repeat the numbers from 1 to the number of chunks 18 times and add it to the result. In the end everything is merged back into a df
This should also work if one of your PIFF_ID goups contains not a multiple of 18 data points
added_automatic_annotation <- bind_rows(lapply(unique(test$PIFF_ID), function(x){
# selecting the current PIFF_ID
sub_df <- test[test$PIFF_ID == x, ]
# how often to repeat each label (allows for the final group to be shorter than 18)
reps <- sapply(split(sub_df$PIFF_ID, ceiling(seq_along(sub_df$PIFF_ID)/18)), function(x) length(x))
# sorting the df
sub_df <- sub_df[order(sub_df$days_before_event, decreasing = T), ]
# add the anntotation by repeating each level 18 or less times (starting at the end because of the decending order imposed above)
sub_df$automatic_label <- rep(c(length(reps):1), times = reps)
return(sub_df)
}))
Upvotes: 1