Reputation: 646
I have the following consultation. Suppose I have the following base with two columns, one with the ID and one with the value.
What I need is to create an additional column (value_ok) with the following logic: For each ID that has the same letter, no ID with a higher value should have an amount greater than a lower value ID. If this happens, it should be replaced by the same value.
db<-data.frame(id=c("A_1","A_2","A_3","A_4","B_1","B_2","B_3","B_4","C_1","C_2","C_3","C_4","D_1","D_2","D_3","D_4","E_1","E_4"),
value=c(10,9,8,7,7,8,9,5,15,30,14,20,10,10,10,20,30,40),
value_ok=c(10,9,8,7,9,9,9,5,30,30,20,20,20,20,20,20,40,40))
Here is an example of the values and the final desired output.
id value value_ok
1 A_1 10 10
2 A_2 9 9
3 A_3 8 8
4 A_4 7 7
5 B_1 7 9
6 B_2 8 9
7 B_3 9 9
8 B_4 5 5
9 C_1 15 30
10 C_2 30 30
11 C_3 14 20
12 C_4 20 20
13 D_1 10 20
14 D_2 10 20
15 D_3 10 20
16 D_4 20 20
17 E_1 30 40
18 E_4 40 40
Can someone helps me with this task?
Thanks!
Upvotes: 1
Views: 26
Reputation: 66819
You can do...
library(data.table)
setDT(db)
db[.N:1, v := cummax(value), by=sub("^(.+)_(.+)$", "\\1", id)]
id value value_ok v
1: A_1 10 10 10
2: A_2 9 9 9
3: A_3 8 8 8
4: A_4 7 7 7
5: B_1 7 9 9
6: B_2 8 9 9
7: B_3 9 9 9
8: B_4 5 5 5
9: C_1 15 30 30
10: C_2 30 30 30
11: C_3 14 20 20
12: C_4 20 20 20
13: D_1 10 20 20
14: D_2 10 20 20
15: D_3 10 20 20
16: D_4 20 20 20
17: E_1 30 40 40
18: E_4 40 40 40
The .N:1
sorts the table from last to first temporarily. by=
groups rows. v := cummax(value)
makes a new column with the cumulative max within each group.
The very ugly expression after by=
is due to embedding of important information (the letter) inside a string. I would recommend never doing that. Here's a reference if you want to convert to something better: Split comma-separated strings in a column into separate rows
Upvotes: 3