user4955663
user4955663

Reputation: 1071

How to remove comments from the scala code

Any ideas how to remove comments from the scala code so that:

Here is example code with comments:

object TestCode {

  val a = "A" // a = "AA"
  val b = "B" /* b = "BB" */
  val c = "C" /* multi line comment
  /* c = "CC" nested */ // FOO
  */ // c = "CCC"
  val d = """D""" // d = """DD /* """
  val e = '"' // e = '"' = char literal
  val f = '\"' // f = '\"' = char literal
  val codeStr = " \"  \"\" \"\"\"/* This is literal */\"\"\" val x = \"\"\"5\"\"\" \" "
  "/* This is a literal */" // This is a comment 3
  "// This is a literal with extra comment end string */" // This is a comment 4
  "/* This is a litral with extra comment begin string" // This is a comment 5

}

Code compiles (with warnings about pure expressions).

The C preprocessor gets quite close but fails with nested comments.

object TestCode {
  val a = "A"
  val b = "B"
  val c = "C"
  */
  val d = """D"""
  val e = '"'
  val f = '\"'
  val codeStr = " \"  \"\" \"\"\"/* This is literal */\"\"\" val x = \"\"\"5\"\"\" \" "
  "/* This is a literal */"
  "// This is a literal with extra comment end string */"
  "/* This is a litral with extra comment begin string"
}

I also tried this regex solution but it seems that it fails in case of quote char literals and nested comments as you see:

str.replaceAll("//.*|/\\*(?s:.*?)\\*/|(\"(?:(?<!\\\\)(?:\\\\\\\\)*\\\\\"|[^\r\n\"])*\")", "$1")
res1: String = """object TestCode {

  val a = "A" 
  val b = "B" 
  val c = "C"  
  */ 
  val d = """D""" 
  val e = '"' // e = '"' = char literal
  val f = '\"' // f = '\"' = char literal
  val codeStr = " \"  \"\" \"\"\"/* This is literal */\"\"\" val x = \"\"\"5\"\"\" \" "
  "/* This is a literal */" 
  "// This is a literal with extra comment end string */" 
  "/* This is a literal with extra comment begin string" 

}"""

Scala compiler can do the job but for my understanding there is no compiler option to do just the comment removal.

Upvotes: 1

Views: 108

Answers (1)

user4955663
user4955663

Reputation: 1071

I used Mateusz Kubuszok's proposal and used ScalaMeta for the implementation.

This is the scala-cli script file: SourceCodeCommentRemover.scala

//> using scala "2.13.5"
//> using lib "org.scalameta::scalameta:4.9.7"

import scala.meta._
import java.io.{File, PrintWriter}

object CommentRemover {
  def main(args: Array[String]): Unit = {
    if (args.length != 2) {
      println("Usage: CommentRemover <input file> <output file>")
      sys.exit(1)
    }

    val inputFile = new File(args(0))
    val outputFile = new File(args(1))

    if (!inputFile.exists()) {
      println(s"Input file ${inputFile.getAbsolutePath} does not exist.")
      sys.exit(1)
    }

    val sourceCode = {
      import scala.io.Source
      Source.fromFile(inputFile).mkString
    }

    println(s"Original source: BEGIN\n${sourceCode}\nEND")

    val tree = sourceCode.parse[Source] match {
      case parsers.Parsed.Success(tree) => tree
      case parsers.Parsed.Error(_, msg, _) =>
        println(s"Failed to parse the input file: $msg")
        sys.exit(1)
    }

    val codeWithoutComments = tree.tokens.collect {
      case token if !token.is[Token.Comment] => token.text
    }.mkString

    println(s"Comments removed: BEGIN\n${codeWithoutComments}\nEND")

    val writer = new PrintWriter(outputFile)
    try {
      writer.write(codeWithoutComments)
    } finally {
      writer.close()
    }

    println(s"Comments removed. Output written to ${outputFile.getAbsolutePath}.")
  }
}

This is the test input file: StackOverflowTestCode.scala

object TestCode {

  val a = "A" // a = "AA"
  val b = "B" /* b = "BB" */
  val c = "C" /* multi line comment
  /* c = "CC" nested */ // FOO
  */ // c = "CCC"
  val d = """D""" // d = """DD /* """
  val e = '"' // e = '"' = char literal
  val f = '\"' // f = '\"' = char literal
  val codeStr = " \"  \"\" \"\"\"/* This is literal */\"\"\" val x = \"\"\"5\"\"\" \" "
  "/* This is a literal */" // This is a comment 3
  "// This is a literal with extra comment end string */" // This is a comment 4
  "/* This is a litral with extra comment begin string" // This is a comment 5

}

Run the script:

scala-cli run SourceCodeCommentRemover.scala -- StackOverflowTestCode.scala out.scala

cat  out.scala

object TestCode {

  val a = "A" 
  val b = "B" 
  val c = "C"  
  val d = """D""" 
  val e = '"' 
  val f = '\"' 
  val codeStr = " \"  \"\" \"\"\"/* This is literal */\"\"\" val x = \"\"\"5\"\"\" \" "
  "/* This is a literal */" 
  "// This is a literal with extra comment end string */" 
  "/* This is a litral with extra comment begin string" 

}

Scala-cli version:

scala-cli --version
Scala CLI version: 1.4.1
Scala version (default): 3.4.2

Upvotes: 0

Related Questions