Uri Kluk
Uri Kluk

Reputation: 185

U-SQL: Array cannot be null using Avro Extractor

Array cannot be null using Avro Extractor

Using EventHub and capture to Blob Storage I have a function based on the AvroSamples that tries to transform the file.

This is my U-SQL script:

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [log4net];
REFERENCE ASSEMBLY [Avro]; 
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];


DECLARE @ABI_DATE string = "2017/10/17/"; //replace by ADF pipeline
DECLARE @input_file string = "wasb://archive@sa/namespace/eh/{*}/" + @ABI_DATE +"{*}/{*}/{*}";
DECLARE @output_file string = @"/output/" + @ABI_DATE + "extract.csv";


@rs =
EXTRACT
        SequenceNumber long
        ,EnqueuedTimeUtc string
        ,Body byte[]
FROM @input_file
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@"
    {
        ""type"":""record"",
        ""name"":""EventData"",
        ""namespace"":""Microsoft.ServiceBus.Messaging"",
        ""fields"":[
            {""name"":""SequenceNumber"",""type"":""long""},
            {""name"":""Offset"",""type"":""string""},
            {""name"":""EnqueuedTimeUtc"",""type"":""string""},
            {""name"":""SystemProperties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},
            {""name"":""Properties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},
            {""name"":""Body"",""type"":[""null"",""bytes""]}
        ]
    }
");

@cnt =
SELECT 
    SequenceNumber
    ,Encoding.UTF8.GetString(Body) AS Json   //THIS LINE BREAKS !!!!
    ,EnqueuedTimeUtc
FROM @rs;

OUTPUT @cnt TO @output_file USING Outputters.Text();

If I run the same extractor but comment out the Body field it works as expected.

This is the error:

Inner exception from user expression: Array cannot be null. Parameter name: bytes Current row dump: SequenceNumber: 4622 EnqueuedTimeUtc: NULL Body: NULL

Error while evaluating expression Encoding.UTF8.GetString(Body)

Upvotes: 2

Views: 322

Answers (1)

Uri Kluk
Uri Kluk

Reputation: 185

Florian Mander, gave me the explanation:

the extractor works correctly, you are just passing null values (intentionally, because it's in the schema) in a method (Encoding.GetString) that doesn't accept null as input. In your latest solution you will lose all the records that don't have a body, though. That's a non technical decision if this is fine or not.

So this is the way to fix it (using a WHERE clause)

@cnt =
SELECT 
    SequenceNumber
    ,Encoding.UTF8.GetString(Body) AS Json
    ,EnqueuedTimeUtc
FROM @rs
WHERE Body != null;

Upvotes: 2

Related Questions