Reputation: 2489
Is there a short and sweet code to Carryover string value of a desired line to the whole group?
As a reference, for numeric variables I may want to carryover the value of a numeric variable (y) within group to all observations of that group based on the max() value of another variable (x). I do that by:
d <- data.table(id =c('A','A','A','A','B','B','B','B','B'),
x =c(10, 1, 4, NA, NA, NA, NA, 9 , 23),
y =c( 7, 6, 23, 1 , 2, NA, NA, 9 , 4),
char=c('W','X','Y','Z','T',NA, NA, NA, NA))
d[,aux:=(x==max(x,na.rm=T) & !is.na(x)),by=id]
d[,aux2:=y*aux,by=id]
d[,y_carry_max:=max(aux2,na.rm=T),by=id]
What if instead of carryin a numeric value, I want to carry the value of a string variable (char
), based on aux
(which marks the maximum value of x
)
I suppose this would be an intermediate step
d[aux==T,char_aux:=char,by=id]
How can I carryover the value of char_aux
across the other lines of each gruop to create the variable char_carry_max
?
EDIT1: the desired output is the last column:
id x y char aux aux2 y_carry_max char_aux char_carry_max
1: A 10 7 W TRUE 7 7 W W
2: A 1 6 X FALSE 0 7 NA W
3: A 4 23 Y FALSE 0 7 NA W
4: A NA 1 Z FALSE 0 7 NA W
5: B NA 2 T FALSE 0 4 NA P
6: B NA NA NA FALSE NA 4 NA P
7: B NA NA NA FALSE NA 4 NA P
8: B 9 9 NA FALSE 0 4 NA P
9: B 23 4 P TRUE 4 4 P P
Edit2: regarding @AdagioMolto comment: "Does each value in x correspond to a unique value in char? What if two or more rows feature x == max(x)? Which char should be taken?"
Good question. Assume they are unique for the purpose of this question. What I do in practice is to add a random perturbation of smaller order of magnitude to break ties. In the example above it whould be: d[,x:=x+ (runif(.N)/1000)]
Edit3: besides the nice dplyr answer bellow, is the a more native data.table way of doing this?
Upvotes: 0
Views: 62
Reputation: 1076
With dplyr (and comments from @Frank and @LucasMation) :
d %>% group_by(id) %>%
mutate(char_carry_max = char[which.max(x)], y_carry_max = y[which.max(x)]) %>%
data.table()
Upvotes: 1