mat
mat

Reputation: 2617

Not enough memory to row bind two large datasets

I am trying to row bind two very big datasets, but I lack sufficient RAM memory.

What I tried is:

  1. Read each dataset via A <- fread("A.csv"); B <- fread("B.csv")
  2. Run gc() to free the memory.
  3. Merge the two with AB <- rbindlist(list(A, B), fill = TRUE)

Then I get the error that I don't have enough RAM: Error: cannot allocate vector of size 1.6 Mb.

Note that the two datasets have different column names, that is why I need to use fill = TRUE. Also, the dataset have both numerical and character values.

How can I merge both without running into RAM issues?


Edit

Upvotes: 0

Views: 244

Answers (1)

jblood94
jblood94

Reputation: 16981

If the combined dataset can fit into memory, you could try combining the tables in a CSV via fwrite with append = TRUE.

library(data.table)
fwrite(
  rbindlist(
    list(
      fread("A.csv", nrows = 1L),
      fread("B.csv")
    ),
    fill = TRUE
  )[-1],
  "AB.csv"
)
fwrite(
  fread("A.csv"),
  "AB.csv",
  append = TRUE
)
# maybe restart R here
AB <- fread("AB.csv", fill = TRUE)

Upvotes: 1

Related Questions