oyugi.collins
oyugi.collins

Reputation: 77

Dealing With Incoming Null Values In Cloud Data Fusion When Building Data Pipeline

I have started trying out google cloud data fusion as a prospect ETL tool that I can finally decide to use.When building a data pipeline to fetch data from a REST API source and load it to a MySQL database am facing this error Expected a string but was NULL at line 1 column 221'. Please check the system logs for more details. and yes it's true I have a field that is null from the JSON response am seeing

"systemanswertime": null

How do I deal with null values since the available dropdown in the cloud data fusion studio string is not working are they other optional data types that I can use? Below are two screenshots showing my current data pipeline structure geneneral view

view showing mapping and the output schema

Thank You!!

Upvotes: 0

Views: 2755

Answers (3)

Malaman
Malaman

Reputation: 184

In case you are facing an error such as: No matching schema found for union type: ["string","null"], you could try the following workaround. The root cause of this errors are when the entries in the response from the API doesn't have all the fields it needs to have. For example, some entries may have callerId, channel, last_channel, last data, etc... but others entries may have not have last_channel or whatever other field from the JSON. This leads to a mismatch in the schema provided in the HTTP source and the pipeline fails right away.

As pear this when nodes encounter null values, logical errors, or other sources of errors, you may use an error handler plugin to catch errors. The way is as following:

In the HTTP source plug-in, change the following:

  • Output schema to account for custom field.
  • JSON/XML field mapping to account into custom field. Changed Non-HTTP Error Handling field to Send to Error. This way it pushes the records through error collector and the pipeline proceeds with subsequent records.
  • Added Error Collector and a sink to capture the error records. With this method you will be able to run the pipeline and had the problematic fields detected.

Kind regards, Manuel

Upvotes: 0

Kamaro
Kamaro

Reputation: 1015

What you need to do is to tell HTTP plugin that you are expecting a null by checking the null checkbox in front of output on the right side. See below example enter image description here

Upvotes: 2

davidmesalpz
davidmesalpz

Reputation: 131

You might be getting this error because in the JSON schema you are defining the value properties. You should allow systemanswertime parameter to be NULL.

You could try to parse the JSON value as follow:

    "systemanswertime": {
      "type": [
        "string",
        "null"
      ]
    }

In the case you don't have access to the JSON file, you could try to use this plug in in order to enable the HTTP to manage nulleable values by dynamically substituting the configurations that can be served by the HTTP Server. You will need access to the HTTP endpoint in order construct an accessible HTTP endpoint that can serve content similar to:

{
      "name" : "output.schema", "type" : "schema", "value" : 
      [
        { "name" : "id", "type" : "int", "nullable" : true},
        { "name" : "first_name", "type" : "string", "nullable" : true},
        { "name" : "last_name", "type" : "string", "nullable" : true},
        { "name" : "email", "type" : "string", "nullable" : true},
      ]
},

Upvotes: 0

Related Questions