Reputation: 664
I am making some plots from large datasets. In this code the sizes of the resultant required plot objects are very small, but the increased usage of memory is much more than that.
My findings so far, is that the increase in memory usage seems to be due to a few objects. In particular, the value of the object tab_ind
does not change after the graph plotting process (checked using the identical()
function), but its size increases significantly after the process (checked using the object.size()
function). The only thing I do with tab_ind
during the process, is passing it to functions as arguments.
REPRODUCIBLE EXAMPLE
The size of simulation can be controlled by varying N
. At the end of the run, the change in sizes and check for identicality of tab_ind
are printed.
library(data.table)
library(magrittr)
library(ggplot2)
N <- 6000
set.seed(runif(1, 0, .Machine$integer.max) %>% ceiling)
logit <- function(x) {return(log(x/(1-x)))}
invLogit <- function(x) {return(exp(x)/(1+exp(x)))}
tab_dat <- data.table(datasetID = seq(N), MIX_MIN_SUCCESS = sample(c(0, 1), N, replace = T), MIX_ALL = sample(c(0, 1), N, replace = T))
tab_dat[MIX_MIN_SUCCESS == 0, MIX_ALL := 0]
n <- sample(20:300, N, replace = T)
tab_ind <- data.table(
datasetID = rep(seq(N), times = n),
SIM_ADJ_PP1 = runif(sum(n), 0.00001, 0.99999),
MIX_ADJ_PP1 = runif(sum(n), 0.00001, 0.99999)
)
tab_ind[, c("SIM_ADJ_LOGIT_PP1", "MIX_ADJ_LOGIT_PP1") := list(logit(SIM_ADJ_PP1), logit(MIX_ADJ_PP1))]
checkMem_gc <- function(status) {
print(status)
print(memory.size())
gc()
print(memory.size())
}
## Individual bins for x and y
tab_by_bin_idxy <- function(dt, x, y, xNItv, yNItv, by = "quantile") {
#Binning
if (by == "even") {
checkMem_gc("start x-y breaks")
checkMem_gc("start x breaks")
minN = dt[, min(get(x), na.rm = T)]
checkMem_gc("after x min")
maxN = dt[, max(get(x), na.rm = T)]
checkMem_gc("after x max")
xBreaks = seq(minN, maxN, length.out = xNItv + 1)
checkMem_gc("after seq")
checkMem_gc("after x breaks")
yBreaks = dt[, seq(min(get(y), na.rm = T), max(get(y), na.rm = T), length.out = yNItv + 1)]
checkMem_gc("after y breaks")
} else if (by == "quantile") {
xBreaks = dt[, quantile(get(x), seq(0, 1, length.out = xNItv + 1), names = F)]
yBreaks = dt[, quantile(get(y), seq(0, 1, length.out = yNItv + 1), names = F)]
} else {stop("type of 'by' not support")}
checkMem_gc("after x-y breaks")
xbinCode = dt[, .bincode(get(x), breaks = xBreaks, include.lowest = T)]
checkMem_gc("after x binCode")
xbinMid = sapply(seq(xNItv), function(i) {return(mean(xBreaks[c(i, i+1)]))})[xbinCode]
checkMem_gc("after x binMid")
ybinCode = dt[, .bincode(get(y), breaks = yBreaks, include.lowest = T)]
checkMem_gc("after y binCode")
ybinMid = sapply(seq(yNItv), function(i) {return(mean(yBreaks[c(i, i+1)]))})[ybinCode]
checkMem_gc("after y binMid")
#Creating table
tab_match = CJ(xbinCode = seq(xNItv), ybinCode = seq(yNItv))
checkMem_gc("after tab match")
tab_plot = data.table(xbinCode, xbinMid, ybinCode, ybinMid)[
tab_match, .(xbinMid = xbinMid[1], ybinMid = ybinMid[1], N = .N), keyby = .EACHI, on = c("xbinCode", "ybinCode")
]
checkMem_gc("after tab plot")
colnames(tab_plot)[colnames(tab_plot) == "xbinCode"] = paste0(x, "_binCode")
colnames(tab_plot)[colnames(tab_plot) == "xbinMid"] = paste0(x, "_binMid")
colnames(tab_plot)[colnames(tab_plot) == "ybinCode"] = paste0(y, "_binCode")
colnames(tab_plot)[colnames(tab_plot) == "ybinMid"] = paste0(y, "_binMid")
checkMem_gc("after col name")
rm(list = c("xBreaks", "yBreaks", "xbinCode", "ybinCode", "xbinMid", "ybinMid", "tab_match"))
checkMem_gc("after rm")
#Returning table
return(tab_plot)
}
tab_by_obin_x_str_y <- function(dt, x, y, width, Nbin, by = "even") {
#Binning
if (by == "even") {
xLLim = dt[, seq(min(get(x), na.rm = T), max(get(x), na.rm = T) - width, length.out = Nbin)]
xULim = dt[, seq(min(get(x), na.rm = T) + width, max(get(x), na.rm = T), length.out = Nbin)]
} else if (by == "quantile") {
xLLim = dt[, quantile(get(x), seq(0, 1 - width, length.out = Nbin), names = F)]
xULim = dt[, quantile(get(x), seq(width, 1, length.out = Nbin), names = F)]
} else {stop("type of 'by' not support")}
xbinMid = (xLLim + xULim) / 2
#summarizing y
tab_out <- sapply(seq(Nbin), function(i) {
dt[get(x) >= xLLim[i] & get(x) <= xULim[i], c(mean(get(y), na.rm = T), sd(get(y), na.rm = T),
quantile(get(y), c(0.025, 0.975), names = F))]
}) %>% t %>% as.data.table %>% set_colnames(., c("mean", "sd", ".025p", ".975p")) %>%
cbind(data.table(binCode = seq(Nbin), xLLim, xbinMid, xULim), .)
tab_out[, c("mean_plus_1sd", "mean_minus_1sd") := list(mean + sd, mean - sd)]
return(tab_out)
}
plotEnv <- new.env()
backupEnv <- new.env()
gc()
gc()
checkMem_gc("Starting memory size checking")
start.mem.size <- memory.size()
start_ObjSizes <- sapply(ls(), function(x) {object.size(get(x))})
start_tab_ind <- tab_ind
start_tab_ind_size <- object.size(tab_ind)
dummyEnv <- new.env()
with(dummyEnv, {
## Set function for analyses against SIM_PP1
fcn_SIM_PP1 <- function(dt, newTab = T) {
dat_prob = tab_by_bin_idxy(dt, x = "SIM_ADJ_PP1", y = "MIX_ADJ_PP1", xNItv = 50, yNItv = 50, by = "even")
checkMem_gc("after tab prob")
dat_logit = tab_by_bin_idxy(dt, x = "SIM_ADJ_LOGIT_PP1", y = "MIX_ADJ_LOGIT_PP1",
xNItv = 50, yNItv = 50, by = "even")
checkMem_gc("after tab logit")
if ((!newTab) && exists("summarytab_logit_SIM_ADJ_PP1", where = backupEnv) &&
exists("summarytab_prob_SIM_ADJ_PP1", where = backupEnv)) {
summarytab_logit = get("summarytab_logit_SIM_ADJ_PP1", envir = backupEnv)
summarytab_prob = get("summarytab_prob_SIM_ADJ_PP1", envir = backupEnv)
} else {
summarytab_logit = tab_by_obin_x_str_y(dt, x = "SIM_ADJ_LOGIT_PP1", y = "MIX_ADJ_LOGIT_PP1",
width = 0.05, Nbin = 1000, by = "even")
summarytab_prob = summarytab_logit[, .(
binCode, invLogit(xLLim), invLogit(xbinMid), invLogit(xULim), invLogit(mean), sd,
invLogit(`.025p`), invLogit(`.975p`), invLogit(mean_plus_1sd), invLogit(mean_minus_1sd)
)] %>% set_colnames(colnames(summarytab_logit))
assign("summarytab_logit_SIM_ADJ_PP1", summarytab_logit, envir = backupEnv)
assign("summarytab_prob_SIM_ADJ_PP1", summarytab_prob, envir = backupEnv)
}
checkMem_gc("after summary tab")
plot_prob <- ggplot(dat_prob, aes(x = SIM_ADJ_PP1_binMid)) +
geom_vline(xintercept = 1, linetype = "dotted") +
geom_hline(yintercept = 1, linetype = "dotted") +
geom_abline(slope = 1, intercept = 0, size = 1.5, linetype = "dashed", alpha = 0.5) +
geom_point(aes(y = MIX_ADJ_PP1_binMid, size = N), alpha = 0.5, na.rm = T) +
geom_line(data = summarytab_prob, aes(x = xbinMid, y = mean), size = 1.25, color = "black", na.rm = T) +
geom_line(data = summarytab_prob, aes(x = xbinMid, y = mean_plus_1sd), size = 1.25, color = "blue", na.rm = T, linetype = "dashed") +
geom_line(data = summarytab_prob, aes(x = xbinMid, y = mean_minus_1sd), size = 1.25, color = "blue", na.rm = T, linetype = "dashed") +
scale_size_continuous(range = c(0.5, 5)) +
scale_x_continuous(name = "Simulated PP", breaks = seq(0, 1, 0.25),
labels = c("0%", "25%", "50%", "75%", "100%")) +
scale_y_continuous(name = "Estimated PP", limits = c(0, 1), breaks = seq(0, 1, 0.25),
labels = c("0%", "25%", "50%", "75%", "100%")) +
theme_classic() +
theme(axis.title = element_text(size = 18),
axis.text = element_text(size = 16))
checkMem_gc("after plot prob")
rm(dat_prob)
rm(summarytab_prob)
checkMem_gc("after removing dat_prob and summary_prob")
plot_logit <- ggplot(dat_logit, aes(x = SIM_ADJ_LOGIT_PP1_binMid)) +
geom_abline(slope = 1, intercept = 0, size = 1.5, linetype = "dashed", alpha = 0.5) +
geom_point(aes(y = MIX_ADJ_LOGIT_PP1_binMid, size = N), alpha = 0.5, na.rm = T) +
geom_line(data = summarytab_logit, aes(x = xbinMid, y = mean), size = 1.25, color = "black", na.rm = T) +
geom_line(data = summarytab_logit, aes(x = xbinMid, y = mean_plus_1sd), size = 1.25, color = "blue", na.rm = T, linetype = "dashed") +
geom_line(data = summarytab_logit, aes(x = xbinMid, y = mean_minus_1sd), size = 1.25, color = "blue", na.rm = T, linetype = "dashed") +
scale_size_continuous(range = c(0.5, 5)) +
scale_x_continuous(name = "Simulated LOGIT PP1",
breaks = c(0.00001, 0.001, 0.05, 0.5, 0.95, 0.999, 0.99999) %>% logit,
labels = c("0.001%", "0.1%", "5%", "50%", "95%", "99.9%", "99.999%")) +
scale_y_continuous(name = "Estimated LOGIT PP1", limits = c(-12, 12),
breaks = c(0.00001, 0.001, 0.05, 0.5, 0.95, 0.999, 0.99999) %>% logit,
labels = c("0.001%", "0.1%", "5%", "50%", "95%", "99.9%", "99.999%")) +
theme_classic() +
theme(axis.title = element_text(size = 18),
axis.text = element_text(size = 16))
checkMem_gc("after plot logit")
rm(summarytab_logit)
rm(dat_logit)
checkMem_gc("after removing dat_logit and summary_logit")
return(list(plot_prob, plot_logit))
}
checkMem_gc("after defining function")
## Tabling
tab_stat <- tab_ind[, c("MIX_MIN_SUCCESS", "MIX_ALL") := list(
tab_dat[tab_ind[, datasetID], MIX_MIN_SUCCESS],
tab_dat[tab_ind[, datasetID], MIX_ALL]
)]
checkMem_gc("after new tab_stat")
tab_stat_MIN_SUCCESS <- tab_stat[MIX_MIN_SUCCESS == 1]
checkMem_gc("after new new tab_stat_MIN_SUCCESS")
tab_stat_MIX_ALL <- tab_stat[MIX_ALL == 1]
checkMem_gc("after new tab_stat_MIX_ALL")
# Generating ggplot objects
print("--- start lst full ---")
lst_full <- fcn_SIM_PP1(tab_stat, newTab = F)
checkMem_gc("after lst full")
rm(tab_stat)
checkMem_gc("after rm tab_stat")
print("--- start lst MIN_SUCCESS ---")
lst_MIN_SUCCESS <- fcn_SIM_PP1(tab_stat_MIN_SUCCESS, newTab = F)
checkMem_gc("after lst MIN_SUCCESS")
rm(tab_stat_MIN_SUCCESS)
checkMem_gc("after rm tab_MIN_SUCCESS")
print("--- start lst MIX_ALL ---")
lst_MIX_ALL <- fcn_SIM_PP1(tab_stat_MIX_ALL, newTab = F)
checkMem_gc("after lst MIX_ALL")
rm(tab_stat_MIX_ALL)
checkMem_gc("after rm tab_stat_MIX_ALL")
## Start plotting
print("--- Start plotting ---")
assign("full_sp_MIX_ADJ_PP1_vs_SIM_ADJ_PP1", lst_full[[1]], envir = plotEnv)
checkMem_gc("after assign1")
assign("full_sp_MIX_ADJ_LOGIT_PP1_vs_SIM_ADJ_LOGIT_PP1", lst_full[[2]], envir = plotEnv)
checkMem_gc("after assign2")
rm(lst_full)
checkMem_gc("after removing lst_full")
assign("MIN_SUCCESS_sp_MIX_ADJ_PP1_vs_SIM_ADJ_PP1", lst_MIN_SUCCESS[[1]], envir = plotEnv)
checkMem_gc("after assign3")
assign("MIN_SUCCESS_sp_MIX_ADJ_LOGIT_PP1_vs_SIM_ADJ_LOGIT_PP1", lst_MIN_SUCCESS[[2]], envir = plotEnv)
checkMem_gc("after assign4")
rm(lst_MIN_SUCCESS)
checkMem_gc("after removing lst_MIN_SUCCESS")
assign("MIX_ALL_sp_MIX_ADJ_PP1_vs_SIM_ADJ_PP1", lst_MIX_ALL[[1]], envir = plotEnv)
checkMem_gc("after assign5")
assign("MIX_ALL_sp_MIX_ADJ_LOGIT_PP1_vs_SIM_ADJ_LOGIT_PP1", lst_MIX_ALL[[2]], envir = plotEnv)
checkMem_gc("after assign6")
rm(lst_MIX_ALL)
checkMem_gc("after removing lst_MIX_ALL")
})
checkMem_gc("--- Finishing ---")
rm(dummyEnv)
gc()
checkMem_gc("After clean up")
final.mem.size <- memory.size()
end_ObjSizes <- sapply(ls(), function(x) {object.size(get(x))})
print("")
print("")
print("--- The sizes of all objects (under .GlobalEnv) BEFORE the graph plotting process ---")
print("--- (Before the process starts, all existing objects are stored under .GlobalEnv) ---")
print(start_ObjSizes)
print("")
print("--- The sizes of all objects (under .GlobalEnv) AFTER the graph plotting process ---")
print(end_ObjSizes)
print("--- I have not altered any existing objects under .GlobalEnv during the process, I only passed them to functions. And yet their sizes increase! ---")
print("--- Let's look at the object tab_ind, which shows the largest inflation in object size ---")
print("--- This is the size of tab_ind BEFORE the process: ---")
print(start_tab_ind_size)
print("--- This is the size of tab_ind AFTER the process: ---")
print(object.size(tab_ind))
print("--- But they are identical (checked using the function identical())! ---")
print(identical(start_tab_ind, tab_ind))
print("")
UPDATED REPRODUCIBLE EXAMPLE
This is an updated, simpler reproducible example. The latest finding is that to make a copy of data.table
object, <- data.table::copy()
should be used instead of <-
. The latter only creates a pointer to the same value (i.e. by reference). Altering the value of the new pointer would changes the object size of the original pointer, that was why object size inflated when I made change to the new pointer. Although I am not sure if it is the only source of memory usage inflation.
library(data.table)
library(magrittr)
library(ggplot2)
N <- 6000
set.seed(runif(1, 0, .Machine$integer.max) %>% ceiling)
logit <- function(x) {return(log(x/(1-x)))}
invLogit <- function(x) {return(exp(x)/(1+exp(x)))}
tab_dat <- data.table(datasetID = seq(N), MIX_MIN_SUCCESS = sample(c(0, 1), N, replace = T), MIX_ALL = sample(c(0, 1), N, replace = T))
tab_dat[MIX_MIN_SUCCESS == 0, MIX_ALL := 0]
n <- sample(20:300, N, replace = T)
tab_ind <- data.table(
datasetID = rep(seq(N), times = n),
SIM_ADJ_PP1 = runif(sum(n), 0.00001, 0.99999),
MIX_ADJ_PP1 = runif(sum(n), 0.00001, 0.99999)
)
## Individual bins for x and y
tab_by_bin_idxy <- function(dt, x, y, xNItv, yNItv, by = "quantile") {
#Binning
if (by == "even") {
minN = dt[, min(get(x), na.rm = T)]
maxN = dt[, max(get(x), na.rm = T)]
xBreaks = seq(minN, maxN, length.out = xNItv + 1)
yBreaks = dt[, seq(min(get(y), na.rm = T), max(get(y), na.rm = T), length.out = yNItv + 1)]
} else if (by == "quantile") {
xBreaks = dt[, quantile(get(x), seq(0, 1, length.out = xNItv + 1), names = F)]
yBreaks = dt[, quantile(get(y), seq(0, 1, length.out = yNItv + 1), names = F)]
}
xbinCode = dt[, .bincode(get(x), breaks = xBreaks, include.lowest = T)]
xbinMid = sapply(seq(xNItv), function(i) {return(mean(xBreaks[c(i, i+1)]))})[xbinCode]
ybinCode = dt[, .bincode(get(y), breaks = yBreaks, include.lowest = T)]
ybinMid = sapply(seq(yNItv), function(i) {return(mean(yBreaks[c(i, i+1)]))})[ybinCode]
#Creating table
tab_match = CJ(xbinCode = seq(xNItv), ybinCode = seq(yNItv))
tab_plot = data.table(xbinCode, xbinMid, ybinCode, ybinMid)[
tab_match, .(xbinMid = xbinMid[1], ybinMid = ybinMid[1], N = .N), keyby = .EACHI, on = c("xbinCode", "ybinCode")
]
colnames(tab_plot)[colnames(tab_plot) == "xbinCode"] = paste0(x, "_binCode")
colnames(tab_plot)[colnames(tab_plot) == "xbinMid"] = paste0(x, "_binMid")
colnames(tab_plot)[colnames(tab_plot) == "ybinCode"] = paste0(y, "_binCode")
colnames(tab_plot)[colnames(tab_plot) == "ybinMid"] = paste0(y, "_binMid")
rm(list = c("xBreaks", "yBreaks", "xbinCode", "ybinCode", "xbinMid", "ybinMid", "tab_match"))
#Returning table
return(tab_plot)
}
plotEnv <- new.env()
backupEnv <- new.env()
gc()
gc(verbose = T)
start.mem.size <- memory.size()
start_ObjSizes <- sapply(ls(), function(x) {object.size(get(x))})
start_tab_ind <- copy(tab_ind)
start_tab_ind_size <- object.size(tab_ind)
dummyEnv <- new.env()
with(dummyEnv, {
## Set function for analyses against SIM_PP1
fcn_SIM_PP1 <- function(dt, newTab = T) {
dat_prob = tab_by_bin_idxy(dt, x = "SIM_ADJ_PP1", y = "MIX_ADJ_PP1", xNItv = 50, yNItv = 50, by = "even")
plot_prob <- ggplot(dat_prob, aes(x = SIM_ADJ_PP1_binMid)) +
geom_vline(xintercept = 1, linetype = "dotted") +
geom_hline(yintercept = 1, linetype = "dotted") +
geom_abline(slope = 1, intercept = 0, size = 1.5, linetype = "dashed", alpha = 0.5) +
geom_point(aes(y = MIX_ADJ_PP1_binMid, size = N), alpha = 0.5, na.rm = T) +
scale_size_continuous(range = c(0.5, 5)) +
scale_x_continuous(name = "Simulated PP", breaks = seq(0, 1, 0.25),
labels = c("0%", "25%", "50%", "75%", "100%")) +
scale_y_continuous(name = "Estimated PP", limits = c(0, 1), breaks = seq(0, 1, 0.25),
labels = c("0%", "25%", "50%", "75%", "100%")) +
theme_classic() +
theme(axis.title = element_text(size = 18),
axis.text = element_text(size = 16))
return(plot_prob)
}
## Tabling
tab_stat <- copy(tab_ind)
tab_stat <- tab_stat[, c("MIX_MIN_SUCCESS", "MIX_ALL") := list(
tab_dat[tab_stat[, datasetID], MIX_MIN_SUCCESS],
tab_dat[tab_stat[, datasetID], MIX_ALL]
)]
tab_stat_MIN_SUCCESS <- tab_stat[MIX_MIN_SUCCESS == 1]
tab_stat_MIX_ALL <- tab_stat[MIX_ALL == 1]
# Generating ggplot objects
lst_full <- fcn_SIM_PP1(tab_stat, newTab = F)
lst_MIN_SUCCESS <- fcn_SIM_PP1(tab_stat_MIN_SUCCESS, newTab = F)
lst_MIX_ALL <- fcn_SIM_PP1(tab_stat_MIX_ALL, newTab = F)
## Start plotting
assign("full_sp_MIX_ADJ_PP1_vs_SIM_ADJ_PP1", lst_full, envir = plotEnv)
assign("MIN_SUCCESS_sp_MIX_ADJ_PP1_vs_SIM_ADJ_PP1", lst_MIN_SUCCESS, envir = plotEnv)
assign("MIX_ALL_sp_MIX_ADJ_PP1_vs_SIM_ADJ_PP1", lst_MIX_ALL, envir = plotEnv)
})
rm(dummyEnv)
rm(start_tab_ind)
gc(verbose = T)
final.mem.size <- memory.size()
end_ObjSizes <- sapply(ls(), function(x) {object.size(get(x))})
My sessionInfo()
when running the above example:
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_Hong Kong SAR.1252 LC_CTYPE=English_Hong Kong SAR.1252 LC_MONETARY=English_Hong Kong SAR.1252
[4] LC_NUMERIC=C LC_TIME=English_Hong Kong SAR.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_2.2.1 magrittr_1.5 data.table_1.11.4
loaded via a namespace (and not attached):
[1] colorspace_1.3-2 scales_0.5.0 compiler_3.5.0 lazyeval_0.2.1 plyr_1.8.4 tools_3.5.0 pillar_1.2.3 gtable_0.2.0
[9] tibble_1.4.2 yaml_2.1.19 Rcpp_0.12.18 grid_3.5.0 rlang_0.2.1 munsell_0.4.3
Upvotes: 3
Views: 481
Reputation: 8676
My sense is you need to increase the --min-vsize=
. Why? The error cannot allocate vector of size ...
implies you need to increase --min-vsize=
.
R --min-vsize=400M
Create or add an entry to your .Renviron
file.
R_VSIZE=400M
Ref: Friendly R Startup Configuration
if you answer "No" to either of these questions I'd recommend you upgrade.
The reality here is that if you need to increase the minimum vsize, you likely want to look at your code for assignment gotchas. In most cases, you'll find that you are duplicating data via copy assignment.
For more information on R Gotcha's I highly recommend you read:
R maintains separate areas for fixed and variable sized objects. The first of these is allocated as an array of cons cells (Lisp programmers will know what they are, others may think of them as the building blocks of the language itself, parse trees, etc.), and the second are thrown on a heap of ‘Vcells’ of 8 bytes each. Each cons cell occupies 28 bytes on a 32-bit build of R, (usually) 56 bytes on a 64-bit build.
The default values are (currently) an initial setting of 350k cons cells and 6Mb of vector heap. Note that the areas are not actually allocated initially: rather these values are the sizes for triggering garbage collection. These values can be set by the command line options --min-nsize
and --min-vsize
(or if they are not used, the environment variables R_NSIZE
and R_VSIZE
) when R is started. Thereafter R will grow or shrink the areas depending on usage, never decreasing below the initial values. The maximal vector heap size can be set with the environment variable R_MAX_VSIZE
.
How much time R spends in the garbage collector will depend on these initial settings and on the trade-off the memory manager makes, when memory fills up, between collecting garbage to free up unused memory and growing these areas. The strategy used for growth can be specified by setting the environment variable R_GC_MEM_GROW to an integer value between 0 and 3. This variable is read at start-up. Higher values grow the heap more aggressively, thus reducing garbage collection time but using more memory.
Ref: https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/Memory
The address-space limit is 2Gb under 32-bit Windows unless the OS's default has been changed to allow more (up to 3Gb). See https://www.microsoft.com/whdc/system/platform/server/PAE/PAEmem.mspx and https://msdn.microsoft.com/en-us/library/bb613473(VS.85).aspx. Under most 64-bit versions of Windows the limit for a 32-bit build of R is 4Gb: for the oldest ones it is 2Gb. The limit for a 64-bit build of R (imposed by the OS) is 8Tb.
It is not normally possible to allocate as much as 2Gb to a single vector in a 32-bit build of R even on 64-bit Windows because of preallocations by Windows in the middle of the address space.
Under Windows, R imposes limits on the total memory allocation available to a single session as the OS provides no way to do so: see memory.size and memory.limit.
Upvotes: 5