Snow Ampl
Snow Ampl

Reputation: 21

Spark Int and Integer, udf

I just began to learn Spark. I tried to define a UDF for my data frame. It is a very simple function like this: column is Long type in dataframe.

def category: (Int => Int) ={ a=>

if(a<7){
    return a
}
else{
    if(a>=7 && a<14){
        return 8
    }
    else{
        if(a>=14 && a<28){
            return 9
        }
        else{
            return 10


           }
        }
    }

}

import org.apache.spark.sql.functions.udf
val myudf = udf(category)

val df_1 = df.withColumn("ncol", myudf($col))

It always reports errors:

:78: error: type mismatch;
 found   : Int
 required: Int => Int
               return a
                      ^
:82: error: type mismatch;
 found   : Int(8)
 required: Int => Int
                   return 8
                          ^
:86: error: type mismatch;
 found   : Int(9)
 required: Int => Int
                       return 9
                              ^
:89: error: type mismatch;
 found   : Int(10)
 required: Int => Int
                       return 10
                              ^

Upvotes: 0

Views: 2796

Answers (2)

Leo C
Leo C

Reputation: 22449

Given that the column type is Long, your category method should take Long parameter rather than Int. Here's how I would define the function:

def category: (Long => Long) = { a =>
  if (a < 7) a else
    if (a < 14) 8 else
      if (a < 28) 9 else
        10
}

val myudf = udf(category)

Or, you could just create the UDF in one code block:

val myudf = udf(
  (a: Long) =>
    if (a < 7) a else
      if (a >= 7 && a < 14) 8 else
        if (a >= 14 && a < 28) 9 else
          10
)

Upvotes: 1

puhlen
puhlen

Reputation: 8529

Get rid of your return statements. return is unnecessary in most scala code, as the last expression is returned automatically (this works for if statements as well, the whole if is a statement that returns a value). It's not recommended to use return when you don't have to as it can cause some unexpected behaviour like you are experiencing here.

lets simplify your code

def category: (Int => Int) ={ a=>
  return a
}

This will fail to compile. category is a method that takes no arguments and returns a function Int => Int. So far so good. But now return a causes the method to return the value of a, which is an Int. This breaks because we wanted and Int => Int and we return an Int. To fix this, just remove the return.

def category: (Int => Int) = { a=>
  a
}

Now the code works, because the last (and only) expression in our method is the function a => a which is an Int => Int just like we wanted. The function here is kinda boring, it just returns the input without changes.

Lets try it with your function

def category: (Int => Int) ={ a=>
    if(a<7){
        a
    }
    else{
        if(a>=7 && a<14){
            8
        }
        else{
            if(a>=14 && a<28){
                9
            }
            else{
                10
               }
            }
        }
}

Now it works because instead of trying to return an Int from within the function, we return the entire function which is what we want.

Upvotes: 2

Related Questions