MattS
MattS

Reputation: 63

Modify column of a data.table based on another column and add the new column

I have a data.table DT with two columns

   V1 V2
1:  1  3
2:  2  4
3:  3  5
4:  2  2
5:  3  8
6:  1  4
7:  2  5

For each row I want to take all the entries with the same V1 and add the V2 entries then divide the V2 entry by that sum and add in a third column. For example, in row 1, column 3 the answer should be 3/(3+4). In row 2, column 3 the answer should be 4/(4+2+5), etc.

Ultimately I should have

   V1 V2 V3
1:  1  3 0.4285714
2:  2  4 0.3636364
3:  3  5 0.3846154
4:  2  2 0.1818182
5:  3  8 0.6153846
6:  1  4 0.5714286
7:  2  5 0.4545455

I can get V3 via q <- DT[,V2/sum(V2),by='V1'] but then the rows are in the wrong order

   V1        V1
1:  1 0.4285714
2:  1 0.5714286
3:  2 0.3636364
4:  2 0.1818182
5:  2 0.4545455
6:  3 0.3846154
7:  3 0.6153846

so simply pasting the second column of q to DT will not work. Also, somewhat awkwardly the data.table q now has two columns with the same name, V1.

I have been banging my head on this problem for a few days now, searched high and low and still cannot come up with a simple answer. Any help would be much appreciated.

Upvotes: 4

Views: 1876

Answers (1)

sds
sds

Reputation: 60014

Create a new table using

DT[,list(V2=V2, V3=V2/sum(V2)), by='V1']

(with modified row order) or modify the data.table in place using the assignment operator:

DT[, V3 := V2/sum(V2), by='V1'] 

Note that now the row order is the same.

Please RTFM (half the questions I ask about data.table I could have answered myself if I spent an extra 30 minutes perusing it!)

As for the row order in the new table, I don't think you can easily preserve it. It is also unclear why you would want to preserve the order unless there is another column which dictates it already, in which case you can sort the new table accordingly.

Upvotes: 4

Related Questions