Ricky
Ricky

Reputation: 2750

How to set data for logistic regression in scala?

I am new to scala and I want to implement a logistic regression model.So initially I load a csv file as below:

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 val df = sqlContext.read.format("com.databricks.spark.csv")
    .option("header", "true")
    .option("inferSchema", "true")
    .load("D:/sample.txt")

The file is as below:

P,P,A,A,A,P,NB
N,N,A,A,A,N,NB
A,A,A,A,A,A,NB
P,P,P,P,P,P,NB
N,N,P,P,P,N,NB
A,A,P,P,P,A,NB
P,P,A,P,P,P,NB
P,P,P,A,A,P,NB
P,P,A,P,A,P,NB
P,P,A,A,P,P,NB
P,P,P,P,A,P,NB
P,P,P,A,P,P,NB
N,N,A,P,P,N,NB
N,N,P,A,A,N,NB
N,N,A,P,A,N,NB
N,N,A,P,A,N,NB
N,N,A,A,P,N,NB
N,N,P,P,A,N,NB
N,N,P,A,P,N,NB
A,A,A,P,P,A,NB
A,A,P,A,A,A,NB
A,A,A,P,A,A,NB
A,A,A,A,P,A,NB
A,A,P,P,A,A,NB
A,A,P,A,P,A,NB
P,N,A,A,A,P,NB
N,P,A,A,A,N,NB
P,N,A,A,A,N,NB
P,N,P,P,P,P,NB
N,P,P,P,P,N,NB

Then I want to train the model by below code:

val lr = new LogisticRegression()
      .setMaxIter(10)
      .setRegParam(0.3)
      .setElasticNetParam(0.8)
      .setFeaturesCol("Feature")
      .setLabelCol("Label")

Then I fit the model by below:

 val lrModel = lr.fit(df)

println(lrModel.coefficients +"are the coefficients")
println(lrModel.interceptVector+"are the intercerpt vactor")
println(lrModel.summary +"is summary")

But it is not printing the results.

Any help is appreciated.

Upvotes: 0

Views: 810

Answers (1)

vdep
vdep

Reputation: 3590

from your code:

val lr = new LogisticRegression()
      .setMaxIter(10)
      .setRegParam(0.3)
      .setElasticNetParam(0.8)
      .setFeaturesCol("Feature")  <- here
      .setLabelCol("Label") <- here

you are setting features column and label column. As you didn't mention column names, i am assuming the column containing NB values is your label and you want to include all others are the columns for prediction.

All predictor variables that you want include in your model, needs to be in form of single vector column, generally called as features column. You need to create it using VectorAssembler as follows:

import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.linalg.Vectors

//creating features column
val assembler = new VectorAssembler()
  .setInputCols(Array(" insert your column names here "))
  .setOutputCol("Feature")

Refer: https://spark.apache.org/docs/latest/ml-features.html#vectorassembler.

Now you can proceed to fit the logistic regression model. pipeline is used to combine multiple transformations beforefitting the data.

val pipeline = new Pipeline().setStages(Array(assembler,lr))

//fitting the model
val lrModel = pipeline.fit(df)

Upvotes: 1

Related Questions