SteveE
SteveE

Reputation: 51

Error encountered when using summaryRprof when setting memory.profiling = TRUE in Rprof

For pedagogical purposes I am trying to compare the memory use profiles of two simple functions. However, I am hitting a problem which I am wondering is a bug in either Rprof() or summayRprof. Does a anyone have any ideas what I am doing wrong or how I can work around this problem?

fun.rbind <- function(indata) {

  outdata <- NULL
  n <- nrow(indata)

  for (i in 1:n) {
     if (!any(is.na(indata[i,]))) outdata <- rbind(outdata, indata[i,])
  }
  outdata
}

And,

  fun.omit <- function(indata) {

      drop <- FALSE
      n = ncol(indata)

      for (i in 1:n) drop <- drop | is.na(indata[, i])
      indata[!drop, ]
    }

Where the second version should be much more efficient in terms of both execution time and memory use.

If I do the following, it works just fine.

    > data.matrix <- matrix(rnorm(2000000), 100000, 20)
    > data.matrix[data.matrix > 2] <- NA
    > Rprof("fun.omit.out", memory.profiling = TRUE)
    > y <- fun.omit(data.matrix)
    > Rprof(NULL)
    > summaryRprof("fun.omit.out",  memory="both")
$by.self
           self.time self.pct total.time total.pct mem.total
"fun.omit"      0.04       50       0.08       100      38.5
"|"             0.04       50       0.04        50      22.5

$by.total
           total.time total.pct mem.total self.time self.pct
"fun.omit"       0.08       100      38.5      0.04       50
"|"              0.04        50      22.5      0.04       50

$sample.interval
[1] 0.02

$sampling.time
[1] 0.08

However, the same operation on the fist function fails with a cryptic error message.

    > Rprof("fun.rbind.out", memory.profiling = TRUE)
    > y <- fun.rbind(data.matrix)
    > Rprof(NULL)
    > summaryRprof("fun.rbind.out",  memory = "both")
Error in rowsum.default(memcounts[rep.int(seq_along(memcounts), ulen)],  : 
  unimplemented type 'NULL' in 'HashTableSetup'
In addition: Warning message:
In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'

If I don't use the memory = 'both' options, everything works as expected.

 > summaryRprof("fun.rbind.out")
$by.self
            self.time self.pct total.time total.pct
"rbind"        312.62    99.64     312.62     99.64
"fun.rbind"      0.84     0.27     313.74    100.00
"any"            0.12     0.04       0.12      0.04
"!"              0.08     0.03       0.08      0.03
"is.na"          0.08     0.03       0.08      0.03

$by.total
            total.time total.pct self.time self.pct
"fun.rbind"     313.74    100.00      0.84     0.27
"rbind"         312.62     99.64    312.62    99.64
"any"             0.12      0.04      0.12     0.04
"!"               0.08      0.03      0.08     0.03
"is.na"           0.08      0.03      0.08     0.03

$sample.interval
[1] 0.02

$sampling.time
[1] 313.74

Upvotes: 2

Views: 337

Answers (1)

SteveE
SteveE

Reputation: 51

If anyone is interested I think I have determined that the source of my problem is a bug in summaryRprof(). If I use memory = "stats" or memory = "timeseries" options I get the expected results every time. Apparently the memory = "both" option does not always work.

For example the following works as I would expect.

> gctorture(on = TRUE)
> Rprof("fun.rbind.out", memory.profiling = TRUE)
> y <- fun.rbind(data.matrix)
> Rprof(NULL) 
> gctorture(on = FALSE)
> summaryRprof("fun.rbind.out", memory = "stats")
index: "fun.rbind"
     vsize.small  max.vsize.small      vsize.large  max.vsize.large 
             188           173856            19658          8962426 
           nodes        max.nodes     duplications tot.duplications 
           16630         15977360                1              860 
         samples 
            1009 
--------------------------------------------------------------- 
index: "fun.rbind":"!"
     vsize.small  max.vsize.small      vsize.large  max.vsize.large 
              17               17            16660            16660 
           nodes        max.nodes     duplications tot.duplications 
            1456             1456                1                1 
         samples 
               1 
--------------------------------------------------------------- 
index: "fun.rbind":"any"
     vsize.small  max.vsize.small      vsize.large  max.vsize.large 
              22               82             9383            67880 
           nodes        max.nodes     duplications tot.duplications 
             796             2688                1              896 
         samples 
             741 
--------------------------------------------------------------- 
index: "fun.rbind":"is.na"
     vsize.small  max.vsize.small      vsize.large  max.vsize.large 
              18               51             9561            34040 
           nodes        max.nodes     duplications tot.duplications 
             958             2520                1               51 
         samples 
              91 
--------------------------------------------------------------- 
index: "fun.rbind":"rbind"
     vsize.small  max.vsize.small      vsize.large  max.vsize.large 
               8               51             5966            68440 
           nodes        max.nodes     duplications tot.duplications 
             777             2800                1              948 
         samples 
            1215 

While it is a bit of a extra work to have to summarize the profiles of execution time and memory use in two steps, I can live with this.

Upvotes: 2

Related Questions