user3803714
user3803714

Reputation: 5389

Spark UDAF: java.lang.InternalError: Malformed class name

I am using Spark 1.5.0 from CDH 5.5.2 distro. I switched to Scala 2.10.5 from 2.10.4. I am using the following code for UDAF. Is this somehow String vs UTF8String issue? If yes, any help will be greatly appreciated.

object GroupConcat extends UserDefinedAggregateFunction {
    def inputSchema = new StructType().add("x", StringType)
    def bufferSchema = new StructType().add("buff", ArrayType(StringType))
    def dataType = StringType
    def deterministic = true 

    def initialize(buffer: MutableAggregationBuffer) = {
      buffer.update(0, ArrayBuffer.empty[String])
    }

    def update(buffer: MutableAggregationBuffer, input: Row) = {
      if (!input.isNullAt(0)) 
        buffer.update(0, buffer.getSeq[String](0) :+ input.getString(0))
    }

    def merge(buffer1: MutableAggregationBuffer, buffer2: Row) = {
      buffer1.update(0, buffer1.getSeq[String](0) ++ buffer2.getSeq[String](0))
    }

    def evaluate(buffer: Row) = UTF8String.fromString(
      buffer.getSeq[String](0).mkString(","))
}

However, I get this error message at runtime:

Exception in thread "main" java.lang.InternalError: Malformed class name
at java.lang.Class.getSimpleName(Class.java:1190)
at org.apache.spark.sql.execution.aggregate.ScalaUDAF.toString(udaf.scala:464)
at java.lang.String.valueOf(String.java:2847)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at scala.StringContext.standardInterpolator(StringContext.scala:122)
at scala.StringContext.s(StringContext.scala:90)
at org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression2.toString(interfaces.scala:96)
at org.apache.spark.sql.catalyst.expressions.Expression.prettyString(Expression.scala:174)
at org.apache.spark.sql.GroupedData$$anonfun$1.apply(GroupedData.scala:86)
at org.apache.spark.sql.GroupedData$$anonfun$1.apply(GroupedData.scala:80)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.apache.spark.sql.GroupedData.toDF(GroupedData.scala:80)
at org.apache.spark.sql.GroupedData.agg(GroupedData.scala:227)

Upvotes: 6

Views: 8270

Answers (2)

pwb2103
pwb2103

Reputation: 540

I ran into this as a conflict with packages that I was importing. If you are importing anything then try testing on a spark shell with nothing imported.

When you define your UDAF check what the name being returned looks like. It should be something like

FractionOfDayCoverage: org.apache.spark.sql.expressions.UserDefinedAggregateFunction{def dataType: org.apache.spark.sql.types.DoubleType.type; def evaluate(buffer: org.apache.spark.sql.Row): Double} = $anon$1@27506b6d

that $anon$1@27506b6d at the end is a reasonable name that will work. When I imported the conflicting package the name returned was 3 times as long. Here is an example:

$$$$bec6d1991b88c272b3efac29d720f546$$$$anon$1@6886601d

Upvotes: 0

Azrrik
Azrrik

Reputation: 51

I received the same exception because my object that extends UserDefinedAggregateFunction was inside of another function.

Change this:

object Driver {
    def main(args: Array[String]) {

        object GroupConcat extends UserDefinedAggregateFunction {
            ...
        }
    }
}

To this:

object Driver {
    def main(args: Array[String]) {
        ...
    }

    object GroupConcat extends UserDefinedAggregateFunction {
        ...
    }
}

Upvotes: 5

Related Questions