Andreas Chiocchetti
Andreas Chiocchetti

Reputation: 31

h2o.importFile() does not import full data frame in R

I have a dataframe of 50 rows (subjects) and 572288 columns (variables)

When parsing the data.frame into an h2o object I lose variables and end up with 51 rows and 419431 variables.

It does not change if I reduce the number of rows or increase them.

library("data.table")
library("h2o")
options("h2o.use.data.table"=T)
h2o.init()
trainset=as.data.frame(matrix(ncol=572288,nrow=50,1))
fwrite(trainset, "train.csv", sep=",")
train=h2o.importFile("train.csv", sep=",")
dim(trainset)
dim(train)

My output is:

> h2o.init()
 Connection successful!

R is connected to the H2O cluster:
H2O cluster uptime:         1 hours 2 minutes
H2O cluster timezone:       Europe/Berlin
H2O data parsing timezone:  UTC
H2O cluster version:        3.18.0.11
H2O cluster version age:    3 months
H2O cluster name:           H2O_started_from_R_chiocchetti_lub856
H2O cluster total nodes:    1
H2O cluster total memory:   9.84 GB
H2O cluster total cores:    24
H2O cluster allowed cores:  20
H2O cluster healthy:        TRUE
H2O Connection ip:          localhost
H2O Connection port:        54321
H2O Connection proxy:       NA
H2O Internal Security:      FALSE
H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4
R Version:                  R version 3.4.3 (2017-11-30)

> trainset=as.data.frame(matrix(ncol=572288,nrow=50,1))
> fwrite(trainset, "train.csv", sep=",")
>
> train=h2o.importFile("train.csv", sep=",")
|======================================================================|100%
> dim(train)
[1]     51 538177
> dim(trainset)
[1]     50 572288

It seems to me that I am running in some kind of memory issue when reading back the lines from the file. However, I have no idea how to overcome this problem.

The final aim is to do a randomForest.

Upvotes: 3

Views: 635

Answers (1)

Lauren
Lauren

Reputation: 5778

This is likely a bug; I've created a jira ticket for it here: https://0xdata.atlassian.net/browse/PUBDEV-5860.

please feel to update the ticket if you have a jira account.

Upvotes: 2

Related Questions