Reputation: 185
Array cannot be null using Avro Extractor
Using EventHub and capture to Blob Storage I have a function based on the AvroSamples that tries to transform the file.
This is my U-SQL script:
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [log4net];
REFERENCE ASSEMBLY [Avro];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
DECLARE @ABI_DATE string = "2017/10/17/"; //replace by ADF pipeline
DECLARE @input_file string = "wasb://archive@sa/namespace/eh/{*}/" + @ABI_DATE +"{*}/{*}/{*}";
DECLARE @output_file string = @"/output/" + @ABI_DATE + "extract.csv";
@rs =
EXTRACT
SequenceNumber long
,EnqueuedTimeUtc string
,Body byte[]
FROM @input_file
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@"
{
""type"":""record"",
""name"":""EventData"",
""namespace"":""Microsoft.ServiceBus.Messaging"",
""fields"":[
{""name"":""SequenceNumber"",""type"":""long""},
{""name"":""Offset"",""type"":""string""},
{""name"":""EnqueuedTimeUtc"",""type"":""string""},
{""name"":""SystemProperties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},
{""name"":""Properties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},
{""name"":""Body"",""type"":[""null"",""bytes""]}
]
}
");
@cnt =
SELECT
SequenceNumber
,Encoding.UTF8.GetString(Body) AS Json //THIS LINE BREAKS !!!!
,EnqueuedTimeUtc
FROM @rs;
OUTPUT @cnt TO @output_file USING Outputters.Text();
If I run the same extractor but comment out the Body field it works as expected.
This is the error:
Inner exception from user expression: Array cannot be null. Parameter name: bytes Current row dump: SequenceNumber: 4622 EnqueuedTimeUtc: NULL Body: NULL
Error while evaluating expression Encoding.UTF8.GetString(Body)
Upvotes: 2
Views: 322
Reputation: 185
Florian Mander, gave me the explanation:
the extractor works correctly, you are just passing null values (intentionally, because it's in the schema) in a method (Encoding.GetString) that doesn't accept null as input. In your latest solution you will lose all the records that don't have a body, though. That's a non technical decision if this is fine or not.
So this is the way to fix it (using a WHERE clause)
@cnt =
SELECT
SequenceNumber
,Encoding.UTF8.GetString(Body) AS Json
,EnqueuedTimeUtc
FROM @rs
WHERE Body != null;
Upvotes: 2