phoenix
phoenix

Reputation: 335

Vectorized alternative for finding sum against each match

Can someone please suggest vectorized alternative for the following code:

x=read.table("1.txt")
y=read.table("unique.txt")
nrowx=nrow(x)
nrowy=nrow(y)
for(i in 1:nrowy)
    {
        y[i,3]=0
        for(j in i:nrowx)            
        {
            if((y[i,1]==x[j,1])&(y[i,2]==x[j,2]))
            {
                y[i,3] = (y[i,3] + x[j,3])
            }
        }
    }

File y contains unique elements of x (considering 1st 2 columns). For each entry in y, all possible matches are found in x, and sum of the corresponding values in column 3 of x is stored in 3rd column of y. This is done for each row of y.

Few lines of file read in x:

"X1" "X2" "X3"
"1" 4 10 -1440
"2" 4 10 -3765
"3" 10 22 523
"4" 10 295 730
"5" 10 295 1599
"6" 10 584 1872
"7" 10 403 1872
"8" 10 403 1872
"9" 10 281 554
"10" 10 123 554

Few lines of file read in y:

"X1" "X2" "X3" 
"1" 4 10 NA 
"3" 10 22 NA 
"4" 10 295 NA 
"6" 10 584 NA 
"7" 10 403 NA 
"9" 10 281 NA 
"10" 10 123 NA 

Expected output:

X1  X2    X3
1  4  10 -5205
2 10  22   523
3 10 123   554
4 10 281   554
5 10 295  2329
6 10 403  3744
7 10 584  1872

The files are very large and these loops are taking very much time. Some alternative that doesn't involve loops is welcome. Thanks!

Upvotes: 2

Views: 171

Answers (1)

janos
janos

Reputation: 124648

It seems the aggregate function can be useful here:

aggregate(x, list(x$X1, x$X2), sum) -> a
data.frame(X1=a$Group.1, X2=a$Group.2, X3=a$X3)
  X1  X2    X3
1  4  10 -5205
2 10  22   523
3 10 123   554
4 10 281   554
5 10 295  2329
6 10 403  3744
7 10 584  1872

Is this faster? Let me know.

Upvotes: 2

Related Questions