Stijn
Stijn

Reputation: 459

IllegalArgumentException: u'requirement failed: Invalid initial capacity' in Spark on Google DataProc

I am currently trying to run a ml decision tree on a large dataset (30 mio observations, 13 variables) in Spark 2.0.0 on Google DataProc. When I execute:

labelIndexer = StringIndexer(inputCol="Target", outputCol="indexedLabel").fit(data)

I receive the following error:

IllegalArgumentException: u'requirement failed: Invalid initial capacity'

I do not find a lot information about this error on the internet. Can somebody please explain what the problem is and how I can resolve it?

Upvotes: 0

Views: 819

Answers (1)

Stijn
Stijn

Reputation: 459

The error was due to the fact that the input dataframe (data) was defined but empty.

Upvotes: 1

Related Questions