Conditional sum on a non-equivalent data frame in R

Question

Yesterday I asked how to sum a column based on a condition in a different data.frame. This was a success in small subsets. However it took hours when using the full data. So I thought why not just force a join with the plyr rbind.fill function and then make the conditional sum. Then I realized I didn't know how, so I was hoping you could help me. This is head.

               a           b          c          d             
1        1010001 4507888.889         NA         NA               
2        1010011  843166.708         NA         NA              
3        1010021  612500.000         NA         NA               
4        1010031  740000.000         NA         NA               
5        1010041    4166.667         NA         NA               
6        1010051 3366666.667         NA         NA

This is tail.

                    a   b                 c          d            
689085             NA  NA             70.62    181.1278    
689086             NA  NA            106.30   2383.3616     
689087             NA  NA            768.80 248804.5507    
689088             NA  NA            512.30 189899.9227     
689089             NA  NA            144.70 176382.4634     
689090             NA  NA            340.90 264691.8022

What Im trying to do is taking each value of b and compare it to all values of d and then sum all values in c that fulfill the condition b(just one value)>=f(all values).I've tried with this.

df<-df%>%mutate(sumc=sum(df$g[b>=df$f]))

Which leads me to a column sumc full of 0. With the head and tail I'm showing the output I expect should look something like this.

                    a         b        c          d               e          
1             1010001  4507888.889       NA         NA        1943.72      
2             1010011   843166.708       NA         NA        1943.72
3             1010021   612500.000       NA         NA        1943.72
4             1010031   740000.000       NA         NA        1943.72
5             1010041     4166.667       NA         NA         177.92
6             1010051  3366666.667       NA         NA        1943.72                                          
689085             NA        NA         70.62     181.1278       NA
689086             NA        NA        106.30    2383.3616       NA
689087             NA        NA        768.80  248804.5507       NA
689088             NA        NA        512.30  189899.9227       NA
689089             NA        NA        144.70  176382.4634       NA
689090             NA        NA        340.90  264691.8022       NA

Also, I tried using group_by(a) to have just the values for which sumc is taking a value but it doesn't work.

Thanks to everyone reading this! :)

chinsoon12 · Accepted Answer

Here is an option using rolling join in data.table:

DT[order(D), csc := cumsum(C)]

DT[, sumc := 
    DT[!is.na(D)][DT, on=.(D=B), roll=Inf, mult="last", csc]
]

output:

          A           B      C           D     csc    sumc
 1: 1010001 4507888.889     NA          NA      NA 1943.62
 2: 1010011  843166.708     NA          NA      NA 1943.62
 3: 1010021  612500.000     NA          NA      NA 1943.62
 4: 1010031  740000.000     NA          NA      NA 1943.62
 5: 1010041    4166.667     NA          NA      NA  176.92
 6: 1010051 3366666.667     NA          NA      NA 1943.62
 7:      NA          NA  70.62    181.1278   70.62      NA
 8:      NA          NA 106.30   2383.3616  176.92      NA
 9:      NA          NA 768.80 248804.5507 1602.72      NA
10:      NA          NA 512.30 189899.9227  833.92      NA
11:      NA          NA 144.70 176382.4634  321.62      NA
12:      NA          NA 340.90 264691.8022 1943.62      NA

data:

library(data.table)
DT <- fread("A           B          C          D             
1010001 4507888.889         NA         NA               
1010011  843166.708         NA         NA              
1010021  612500.000         NA         NA               
1010031  740000.000         NA         NA               
1010041    4166.667         NA         NA               
1010051 3366666.667         NA         NA 
NA  NA             70.62    181.1278    
NA  NA            106.30   2383.3616     
NA  NA            768.80 248804.5507    
NA  NA            512.30 189899.9227     
NA  NA            144.70 176382.4634     
NA  NA            340.90 264691.8022")

Conditional sum on a non-equivalent data frame in R

Answers (2)

Related Questions