Reputation: 13
I am using a Data Flow in Azure Data Factory to transform rows of CSV files into JSON documents of a standard layout. I can't figure out how to get rid of empty JSON objects and arrays when there is no data to populate.
For example if I have a CSV like below:
firstName,lastName,Email,Address1,City,State,Zip
Bob,Smith,[email protected],123 st,Somecity,TX,12345
I need to turn it into this:
{
"firstName": "Bob",
"lastName": "Smith",
"contactData": [
{
"contactType": "postalAddress",
"contactData": {
"postalAddress1": "123 st",
"postalCity": "Somecity",
"postalState": "TX",
"postalCode": "12345"
}
},
{
"contactType": "email",
"contactData": {
"emailAddress": "[email protected]"
}
}
]
}
I am using derived columns to build the subcolumns and arrays. I have been able to produce the JSON above. The problem I run into is that if an email or address is null, I want to remove the object from the array. If both are null, I want to remove the entire contactData object.
Example:
firstName,lastName,Email,Address1,City,State,Zip
Bob,Smith,,,,,
I need to turn it into this:
{
"firstName": "Bob",
"lastName": "Smith"
}
If I set all of the child objects to NULL with IF statements I can produce something like this:
{
"firstName": "Bob",
"lastName": "Smith",
"contactData": [
{
"contactData": {}
},
{
"contactData": {}
}
]
}
but I can't get rid of the entire section. The Sink will get rid of the empty string objects, but not the nested JSON objects and arrays. Is there any way to do this in ADF Data Flows?
Upvotes: 0
Views: 1316
Reputation: 6104
[{
"firstName": "Bob",
"lastName": "Smith",
"contactData": [
{
"contactType": "postalAddress",
"contactData": {
"postalAddress1": "123 st",
"postalCity": "Somecity",
"postalState": "TX",
"postalCode": "12345"
}
},
{
"contactType": "email",
"contactData": {
"emailAddress": "[email protected]"
}
}
]
},
{
"firstName": "b1",
"lastName": "s1",
"contactData": [
{
"contactData": {}
},
{
"contactData": {}
}
]
},
{
"firstName": "Bob1",
"lastName": "Smith1",
"contactData": [
{
"contactType": "postalAddress",
"contactData": {
"postalAddress1": "123 st1",
"postalCity": "Somecity1",
"postalState": "TX1",
"postalCode": "123456"
}
},
{
"contactType": "email",
"contactData": {
"emailAddress": "[email protected]"
}
}
]
},
{
"firstName": "b2",
"lastName": "s2",
"contactData": [
{
"contactData": {}
},
{
"contactData": {}
}
]
}]
contactType
. The rows without any contactType
would have the same length (convert to string and find length).tp : length(toString(contactData.contactType))
contactType
(converted to string) length is equal to 2 or not. The split condition would be as shown below:length(toString(contactData.contactType))!=2
noContact
stream would have the following data:condition : name!='contactData'
column name : $$
Name
on hasContact
and select1
stream.Upvotes: 2