LucasMation
LucasMation

Reputation: 2489

Carryover string value based on max() of numeric variable by group in R data.table

Is there a short and sweet code to Carryover string value of a desired line to the whole group?

As a reference, for numeric variables I may want to carryover the value of a numeric variable (y) within group to all observations of that group based on the max() value of another variable (x). I do that by:

d <- data.table(id  =c('A','A','A','A','B','B','B','B','B'),
                x =c(10, 1, 4,  NA, NA, NA, NA, 9 , 23),
                y =c( 7, 6, 23, 1 , 2, NA, NA, 9 , 4),
                char=c('W','X','Y','Z','T',NA, NA, NA, NA))

d[,aux:=(x==max(x,na.rm=T) & !is.na(x)),by=id]
d[,aux2:=y*aux,by=id]
d[,y_carry_max:=max(aux2,na.rm=T),by=id]

What if instead of carryin a numeric value, I want to carry the value of a string variable (char), based on aux (which marks the maximum value of x)

I suppose this would be an intermediate step

d[aux==T,char_aux:=char,by=id]

How can I carryover the value of char_aux across the other lines of each gruop to create the variable char_carry_max ?

EDIT1: the desired output is the last column:

   id  x  y char   aux aux2 y_carry_max char_aux char_carry_max
1:  A 10  7    W  TRUE    7           7        W              W
2:  A  1  6    X FALSE    0           7       NA              W
3:  A  4 23    Y FALSE    0           7       NA              W
4:  A NA  1    Z FALSE    0           7       NA              W
5:  B NA  2    T FALSE    0           4       NA              P
6:  B NA NA   NA FALSE   NA           4       NA              P
7:  B NA NA   NA FALSE   NA           4       NA              P
8:  B  9  9   NA FALSE    0           4       NA              P
9:  B 23  4    P  TRUE    4           4        P              P

Edit2: regarding @AdagioMolto comment: "Does each value in x correspond to a unique value in char? What if two or more rows feature x == max(x)? Which char should be taken?"

Good question. Assume they are unique for the purpose of this question. What I do in practice is to add a random perturbation of smaller order of magnitude to break ties. In the example above it whould be: d[,x:=x+ (runif(.N)/1000)]

Edit3: besides the nice dplyr answer bellow, is the a more native data.table way of doing this?

Upvotes: 0

Views: 62

Answers (1)

MrSmithGoesToWashington
MrSmithGoesToWashington

Reputation: 1076

With dplyr (and comments from @Frank and @LucasMation) :

d %>% group_by(id) %>% 
mutate(char_carry_max = char[which.max(x)], y_carry_max = y[which.max(x)]) %>% 
data.table()

Upvotes: 1

Related Questions