Reputation: 3559
I am using spark mlib, and doing classification using Logistic regression model. I followed this link: https://spark.apache.org/docs/2.1.0/ml-classification-regression.html#logistic-regression
import org.apache.spark.ml.classification.LogisticRegression;
import org.apache.spark.ml.classification.LogisticRegressionModel;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
// Load training data
Dataset<Row> training = spark.read().format("libsvm")
.load("data/mllib/sample_libsvm_data.txt");
LogisticRegression lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8);
// Fit the model
LogisticRegressionModel lrModel = lr.fit(training);
// Print the coefficients and intercept for logistic regression
System.out.println("Coefficients: "
+ lrModel.coefficients() + " Intercept: " + lrModel.intercept());
// We can also use the multinomial family for binary classification
LogisticRegression mlr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
.setFamily("multinomial");
// Fit the model
LogisticRegressionModel mlrModel = mlr.fit(training);
I am not sure how this model identify label and features if i am taking .csv as input? can anyone explain it ?
Upvotes: 2
Views: 2156
Reputation: 3559
Finally i was able to fix it, i need to use VectorAssembler
or StringIndexer
transformer, and there i have setInputCol
, setOutputCol
method which provides way for setting label and features.
VectorAssembler assembler = new VectorAssembler()
.setInputCols(new String[]{"Lead ID"})
.setOutputCol("features");
sparkSession.read().option("header", true).option("inferSchema","true").csv("Book.csv");
dataset = new StringIndexer().setInputCol("Status").setOutputCol("label").fit(dataset).transform(dataset);
Upvotes: 2
Reputation: 155
Because you load libsvm fromat data,it consists of label index1:value1 index2:value2...... If you use .csv,you must specify parmeters obviously.
Upvotes: 2