Brandon Bertelsen
Brandon Bertelsen

Reputation: 44698

Row wise operation on data.table

Let's say I'd like to calculate the magnitude of the range over a few columns, on a row-by-row basis.

set.seed(1)
dat <- data.frame(x=sample(1:1000,1000),
                  y=sample(1:1000,1000),
                  z=sample(1:1000,1000))

Using data.frame(), I would do something like this:

dat$diff_range <- apply(dat,1,function(x) diff(range(x)))

To put it more simply, I'm looking for this operation, over each row:

diff(range(dat[1,]) # for i 1:nrow(dat)

If I were doing this for the entire table, it would be something like:

setDT(dat)[,diff_range := apply(dat,1,function(x) diff(range(x)))]

But how would I do it for only named (or numbered) rows?

Upvotes: 4

Views: 2565

Answers (3)

Frank
Frank

Reputation: 66819

pmax and pmin find the min and max across columns in a vectorized way, which is much better than splitting and working with each row separately. It's also pretty concise:

dat[, r := do.call(pmax,.SD) - do.call(pmin,.SD)]


        x   y   z   r
   1: 266 531 872 606
   2: 372 685 967 595
   3: 572 383 866 483
   4: 906 953 437 516
   5: 201 118 192  83
  ---                
 996: 768 945 292 653
 997:  61 231 965 904
 998: 771 145  18 753
 999: 841 148 839 693
1000: 857 252 218 639

Upvotes: 5

Troy
Troy

Reputation: 8701

How about this:

D[,list(I=.I,x,y,z)][,diff(range(x,y,z)),by=I][c(1:4,15:18)]
#    I  V1
#1:  1 971
#2:  2 877
#3:  3 988
#4:  4 241
#5: 15 622
#6: 16 684
#7: 17 971
#8: 18 835

#actually this will be faster
D[c(1:4,15:18),list(I=.I,x,y,z)][,diff(range(x,y,z)),by=I]

use .I to give you an index to call with the by= parameter, then you can run the function on each row. The second call pre-filters by any list of row numbers, or you can add a key and filter on that if your real table looks different.

Upvotes: 5

JeremyS
JeremyS

Reputation: 3525

You can do it by subsetting before/during the function. If you only want every second row for example

dat_Diffs <- apply(dat[seq(2,1000,by=2),],1,function(x) diff(range(x)))

Or for rownames 1:10 (since their names weren't specified they are just numbers counting up)

dat_Diffs <- apply(dat[rownames(dat) %in% 1:10,],1,function(x) diff(range(x)))

But why not just calculate per row then subset later?

Upvotes: 0

Related Questions