Reputation: 335
Can someone please suggest vectorized alternative for the following code:
x=read.table("1.txt")
y=read.table("unique.txt")
nrowx=nrow(x)
nrowy=nrow(y)
for(i in 1:nrowy)
{
y[i,3]=0
for(j in i:nrowx)
{
if((y[i,1]==x[j,1])&(y[i,2]==x[j,2]))
{
y[i,3] = (y[i,3] + x[j,3])
}
}
}
File y contains unique elements of x (considering 1st 2 columns). For each entry in y, all possible matches are found in x, and sum of the corresponding values in column 3 of x is stored in 3rd column of y. This is done for each row of y.
Few lines of file read in x:
"X1" "X2" "X3"
"1" 4 10 -1440
"2" 4 10 -3765
"3" 10 22 523
"4" 10 295 730
"5" 10 295 1599
"6" 10 584 1872
"7" 10 403 1872
"8" 10 403 1872
"9" 10 281 554
"10" 10 123 554
Few lines of file read in y:
"X1" "X2" "X3"
"1" 4 10 NA
"3" 10 22 NA
"4" 10 295 NA
"6" 10 584 NA
"7" 10 403 NA
"9" 10 281 NA
"10" 10 123 NA
Expected output:
X1 X2 X3
1 4 10 -5205
2 10 22 523
3 10 123 554
4 10 281 554
5 10 295 2329
6 10 403 3744
7 10 584 1872
The files are very large and these loops are taking very much time. Some alternative that doesn't involve loops is welcome. Thanks!
Upvotes: 2
Views: 171
Reputation: 124648
It seems the aggregate
function can be useful here:
aggregate(x, list(x$X1, x$X2), sum) -> a
data.frame(X1=a$Group.1, X2=a$Group.2, X3=a$X3)
X1 X2 X3
1 4 10 -5205
2 10 22 523
3 10 123 554
4 10 281 554
5 10 295 2329
6 10 403 3744
7 10 584 1872
Is this faster? Let me know.
Upvotes: 2