Reputation: 459
I am currently trying to run a ml decision tree on a large dataset (30 mio observations, 13 variables) in Spark 2.0.0 on Google DataProc. When I execute:
labelIndexer = StringIndexer(inputCol="Target", outputCol="indexedLabel").fit(data)
I receive the following error:
IllegalArgumentException: u'requirement failed: Invalid initial capacity'
I do not find a lot information about this error on the internet. Can somebody please explain what the problem is and how I can resolve it?
Upvotes: 0
Views: 819
Reputation: 459
The error was due to the fact that the input dataframe (data) was defined but empty.
Upvotes: 1