lserlohn
lserlohn

Reputation: 6206

How to debug a scala based Spark program on Intellij IDEA

I am currently building my development IDE using Intellij IDEA. I followed exactly the same way as http://spark.apache.org/docs/latest/quick-start.html

build.sbt file

name := "Simple Project"

version := "1.0"

scalaVersion := "2.11.7"

 libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0"

Sample Program File

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object MySpark {

    def main(args: Array[String]){
        val logFile = "/IdeaProjects/hello/testfile.txt" 
        val conf = new SparkConf().setAppName("Simple Application")
        val sc = new SparkContext(conf)
        val logData = sc.textFile(logFile, 2).cache()
        val numAs = logData.filter(line => line.contains("a")).count()
        val numBs = logData.filter(line => line.contains("b")).count()
        println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
    }
}

If I use command line:

sbt package

and then

spark-submit --class "MySpark" --master local[4] target/scala-2.11/myspark_2.11-1.0.jar

I am able to generate jar package and spark runs well.

However, I want to use Intellij IDEA to debug the program in the IDE. How can I setup the configuration, so that if I click "debug", it will automatically generate the jar package and automatically launch the task by executing "spark-submit-" command line.

I just want everything could be simple as "one click" on the debug button in Intellij IDEA.

Thanks.

Upvotes: 16

Views: 19140

Answers (4)

Sandeep Purohit
Sandeep Purohit

Reputation: 3692

First define environment variable like below

export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=7777 

Then create the Debug configuration in Intellij Idea as follows

Rub-> Edit Configuration -> Click on "+" left top cornor -> Remote -> set port and name

After above configuration run spark application with spark-submit or sbt run and then run debug which is created in configuration. and add checkpoints for debug.

Upvotes: 25

balaudt
balaudt

Reputation: 91

It is similar to the solution provided here: Debugging Spark Applications. You create a Remote debug run configuration in Idea and pass Java debug parameters to the spark-submit command. The only catch is you need to start the remote debug config in Idea after triggering the spark-submit command. I read somewhere that a Thread.sleep just before your debug point should enable you to do this and I too was able to successfully use the suggestion.

Upvotes: 1

Jeffrey
Jeffrey

Reputation: 11

I've run into this when I switch between 2.10 and 2.11. SBT expects the primary object to be in src->main->scala-2.10 or src->main->scala-2.11 depending on your version.

Upvotes: 1

Alfredo Gimenez
Alfredo Gimenez

Reputation: 2224

If you're using the scala plugin and have your project configured as an sbt project, it should basically work out of the box.

Go to Run->Edit Configurations... and add your run configuration normally.

Since you have a main class, you probably want to add a new Application configuration.

You can also just click on the blue square icon, to the left of your main code.

Once your run configuration is set up, you can use the Debug feature.

Upvotes: 1

Related Questions