sudhakar
sudhakar

Reputation: 26

Is there any way to check whether IndextoString assigns the same values while using labels from StringIndexerModel

I have spark dataframe with two target classes for column status("Placed","Not Placed").

I used StringIndexer to convert the above column to indices in a pipeline and executed it.

str_indexer = StringIndexer(inputCol=status,outputCol="status_index").fit(df_train).transform(df_train)

It assigned these values - {0: 'Placed', 1: 'Not Placed'}

used IndextoString to convert the prediction label generated from the RandomForest algorithm to String again and used the labels generated from the above StringIndexer(str_indexer)

IndexToString(inputCol="prediction",outputCol="status",labels=loaded_model.stages[0].labels).transform(in_indexed)

  1. Is this the correct way to convert the indices back to strings?
  2. Does the above IndextoString assigns the same values i.e., 0 for "Placed" and 1 for "Not Placed" ?
  3. Is there any way to test whether IndextoString assigns the same values as StringIndexer

Upvotes: 0

Views: 351

Answers (1)

Raghu
Raghu

Reputation: 1712

Yes, this is the correct way. To check the mapping you can try after transforming your dataframe using IndextoString as

df.select('status','prediction').distinct().show()

This will give you the mapping

Upvotes: 0

Related Questions