Reputation: 353
I've been trying to register and run a UDF with dotnet apache spark. I am using Microsoft.Spark.0.10.0 on MacOs
This is what I've been trying to do
var options = new Dictionary<string, string>
{
{"delimiter", "|" }
};
var schema = "Username STRING, Machine STRING, Date STRING";
var df = spark
.Read()
.Format("csv")
.Options(options)
.Schema(schema)
.Load(staff);
df.PrintSchema();
df.Show();
spark.Udf().Register<string, string>("MyUDF", randomFunc);
df.CreateOrReplaceTempView("AllLogs");
DataFrame dateDf = spark.Sql("SELECT *, MyUDF(alllogs.Username) FROM AllLogs");
dateDf.Collect();
And random func is
private static string randomFunc(string val)
{
return "Random";
}
I keep seem to be getting the same error. I've tried different ways of creating Udf's, but none of them seem to be working.
This is the error:
[Error] [TaskRunner] [1] ProcessStream() failed with exception: System.NullReferenceException: Object reference not set to an instance of an object.
at Microsoft.Spark.Utils.UdfSerDe.Deserialize(UdfData udfData) in /_/src/csharp/Microsoft.Spark/Utils/UdfSerDe.cs:line 168
at Microsoft.Spark.Utils.CommandSerDe.DeserializeUdfs[T](UdfWrapperData data, Int32& nodeIndex, Int32& udfIndex) in /_/src/csharp/Microsoft.Spark/Utils/CommandSerDe.cs:line 267
at Microsoft.Spark.Utils.CommandSerDe.Deserialize[T](Stream stream, SerializedMode& serializerMode, SerializedMode& deserializerMode, String& runMode) in /_/src/csharp/Microsoft.Spark/Utils/CommandSerDe.cs:line 243
at Microsoft.Spark.Worker.Processor.CommandProcessor.ReadSqlCommands(PythonEvalType evalType, Stream stream) in D:\a\1\s\src\csharp\Microsoft.Spark.Worker\Processor\CommandProcessor.cs:line 190
at Microsoft.Spark.Worker.Processor.CommandProcessor.ReadSqlCommands(PythonEvalType evalType, Stream stream, Version version) in D:\a\1\s\src\csharp\Microsoft.Spark.Worker\Processor\CommandProcessor.cs:line 117
at Microsoft.Spark.Worker.Processor.CommandProcessor.Process(Stream stream) in D:\a\1\s\src\csharp\Microsoft.Spark.Worker\Processor\CommandProcessor.cs:line 62
at Microsoft.Spark.Worker.Processor.PayloadProcessor.Process(Stream stream) in D:\a\1\s\src\csharp\Microsoft.Spark.Worker\Processor\PayloadProcessor.cs:line 74
Upvotes: 3
Views: 347