Our
Our

Reputation: 1035

Duplicate entries with `%in%`

I need to sum the data over certain rows as in

DT[(Rows %in% Is) & (Cols %in% Js), sum(Values), ]

But the issue is that both Is and Js contain duplicate entries, and I want the summation to be performed even in the duplicate cases.

Say Is = c(1,2,2,3) and Js = c(4,5,5,6), then since the condition matches to (2,5) twice, I want to sum the corresponding Value twice. How can I do that?

Edit:

     flowSpeed channelCrossSectionInPX   pxToMeters  timeStep     h      w channelCrossSectionInUM
1: 1.732085e-05                    -475 3.999282e-06 0.1265946 4e-04 0.0038            -0.001899659
2: 1.732085e-05                    -474 3.999282e-06 0.1265946 4e-04 0.0038            -0.001895660
3: 1.732085e-05                    -473 3.999282e-06 0.1265946 4e-04 0.0038            -0.001891660
4: 1.732085e-05                    -472 3.999282e-06 0.1265946 4e-04 0.0038            -0.001887661
5: 1.978421e-05                    -471 3.999282e-06 0.1265946 4e-04 0.0038            -0.001883662
6: 1.978421e-05                    -470 3.999282e-06 0.1265946 4e-04 0.0038            -0.001879663
DT[channelCrossSectionInPX %in% c(-472, -472), sum(h),]
[1] 4e-04

Upvotes: 1

Views: 45

Answers (2)

Waldi
Waldi

Reputation: 41230

You could try:

sum(DT[Is,Js,with=FALSE])

For example with mtcars:

library(data.table)

DT <- as.data.table(mtcars)

Is = c(1,2,2,3)
Js = c(4,5,5,6)

DT[Is,Js,with=FALSE]
#>       hp  drat  drat    wt
#>    <num> <num> <num> <num>
#> 1:   110  3.90  3.90 2.620
#> 2:   110  3.90  3.90 2.875
#> 3:   110  3.90  3.90 2.875
#> 4:    93  3.85  3.85 2.320

sum(DT[Is,Js,with=FALSE])
#> [1] 464.79

Upvotes: 2

akrun
akrun

Reputation: 887221

In the data, there is only a single instance of -472 and %in% return the logical vector for the lhs column value that matches the rhs vector. If we want to replicate, create an index, ie. use match

library(data.table)
DT[match(c(-472, -472), channelCrossSectionInPX), sum(h)]
[1] 8e-04

match works because it returns the index of the matching values

> DT[, match(c(-472, -472), channelCrossSectionInPX)]
[1] 4 4
> DT[match(c(-472, -472), channelCrossSectionInPX)]
      flowSpeed channelCrossSectionInPX   pxToMeters  timeStep     h      w channelCrossSectionInUM
1: 1.732085e-05                    -472 3.999282e-06 0.1265946 4e-04 0.0038            -0.001887661
2: 1.732085e-05                    -472 3.999282e-06 0.1265946 4e-04 0.0038            -0.001887661

data

DT <- structure(list(flowSpeed = c(1.732085e-05, 1.732085e-05, 1.732085e-05, 
1.732085e-05, 1.978421e-05, 1.978421e-05), channelCrossSectionInPX = -475:-470, 
    pxToMeters = c(3.999282e-06, 3.999282e-06, 3.999282e-06, 
    3.999282e-06, 3.999282e-06, 3.999282e-06), timeStep = c(0.1265946, 
    0.1265946, 0.1265946, 0.1265946, 0.1265946, 0.1265946), h = c(4e-04, 
    4e-04, 4e-04, 4e-04, 4e-04, 4e-04), w = c(0.0038, 0.0038, 
    0.0038, 0.0038, 0.0038, 0.0038), channelCrossSectionInUM = c(-0.001899659, 
    -0.00189566, -0.00189166, -0.001887661, -0.001883662, -0.001879663
    )), class = c("data.table", "data.frame"), row.names = c(NA, 
-6L), index = structure(integer(0), "`__channelCrossSectionInPX`" = integer(0)))

Upvotes: 2

Related Questions