Reputation: 26
I have spark dataframe with two target classes for column status("Placed","Not Placed").
I used StringIndexer to convert the above column to indices in a pipeline and executed it.
str_indexer = StringIndexer(inputCol=status,outputCol="status_index").fit(df_train).transform(df_train)
It assigned these values - {0: 'Placed', 1: 'Not Placed'}
used IndextoString to convert the prediction label generated from the RandomForest algorithm to String again and used the labels generated from the above StringIndexer(str_indexer)
IndexToString(inputCol="prediction",outputCol="status",labels=loaded_model.stages[0].labels).transform(in_indexed)
Upvotes: 0
Views: 351
Reputation: 1712
Yes, this is the correct way. To check the mapping you can try after transforming your dataframe using IndextoString as
df.select('status','prediction').distinct().show()
This will give you the mapping
Upvotes: 0