Arun Lakhotia
Arun Lakhotia

Reputation: 51

H2o Parser issue

Using H2o 3.16.0.4 to parse the Kaggle Toxic data classifier competition. The data is not getting parsed even after using parser as csv and separator as ",". Let me know if this is a product related bug or some configuration is missing.

Upvotes: 1

Views: 149

Answers (1)

Lauren
Lauren

Reputation: 5778

The issue is likely that the comment fields have too many newlines so unfortunately changing the separator will not help.

As a workaround you can import the csv with pandas using pandas.read_csv() (which parses correctly) (Note: it's not working in data.table::fread() either as reported here).

To use the data frame in H2O for modeling, you just need to convert the data frame to an H2O Frame (use df = h2o.H2OFrame(my_pandas_frame) in Python.

I've created a JIRA ticket so that this issue is being tracked and worked on.

Upvotes: 1

Related Questions