jeancallisti
jeancallisti

Reputation: 1673

How to create json object from set of items

In Azure Data Factory, I have a Pipeline (made of flowlets, but that's a technicality) that is more or less the following flow:

  1. Get a set of items from a Data Set (let's say : I get 5 cars, each car has its "columns" -- id, color, model, ...)

  2. turn that set into an array : I do it with an Aggregate block which contains a "collect" script function.

What I want :

I would like step 2 to create an object, not an array.

If this was json, this is what I would like:

//NOT THIS
[
 { "id":"1", "model":"", "color":"red" }, 
 { "id":"2", "model":"", "color":"blue" }, 
]


//THIS
{
 "1": { "model":"", "color":"red" }, 
 "2": { "model":"", "color":"blue" }, 
}

I've tried working with the objects as strings and then using a bunch of "replace" to turn [ ] into { } ... but that's just too much grinding -- and more importantly there's too high a risk that I make a mistake with character escape.

How would you turn a set of items into one object instead of an array?

Note: the end goal is to later be able to work with the cars as a dictionary rather than a collection, in programming terms. I just added this for anyone who might be googling this without knowing exactly what they're looking for.

Upvotes: 0

Views: 1729

Answers (2)

jeancallisti
jeancallisti

Reputation: 1673

The other answer says that the question is nonsensical. But it's not entirely true.

If your goal is really to create a dictionary, which is to say that you really want to have one column per Id, i.e. one output column per input row then you can investigate the following Azure Data Factory field : "rows to columns", aka pivot. for example : Rows to Columns in ADF

Pay close attention to the explanations on "Allow schema drift" to manage the evolution of the schema as the columns get transformed.

Note : of course you want to be sure that there will be a reasonably low number of input Ids -- you don't want to create one giant dictionary. I don't know how ADF would react to that, performance-wise.

Note: In this answer I'm discussing only the Ids transformation, but of course there's still the matter of turning the other fields/columns into the dictionary object meant to match the key. That would be done the same way as in the other answer (look at the part dealing with "ALL OTHER FIELDS").

Upvotes: 0

jeancallisti
jeancallisti

Reputation: 1673

The question is nonsensical. this answer is meant for anyone else to understand why.

Data Factory works with schemas. Each column is strongly typed.

For example, if we imagine that I started with a Data Set based off a json file :

[
 { "id":"1", "model":"", "color":"red" }, 
 { "id":"2", "model":"", "color":"blue" }, 
]

Then those values are immediately named and converted to known data types : There's a column called "id", another column called "model", and a third column called "color". They have names. There's a pattern. Each row has the same pattern.

The only thing that's allowed not to have a name is an array : because we know that it's a set of items that all have the same schema (i.e. column names).

But if you turn it into a dictionary, you break the pattern : you now have an object whose fields (columns) have unpredictable names.

{
 "1": { "model":"", "color":"red" }, 
 "2": { "model":"", "color":"blue" }, 
}

for example in the dictionary just above there's now a field (column) named "1", another one named "2", another one named "3"... There's no pattern to that object. We don't know how many fields it can have. Remember: That's not an array. That's meant to be an object. You're supposed to know how many fields an object has if you plan on typing it. It would be even worse if the Ids were Guids. Completely scrambled, unpredictable fields/column names.

In other words, you cannot infer a type for that json data. and therefore you cannot parse it to an object that has a schema, to feed it to your flow in Data Factory.

=================

That being said, you can still create the Dictionary as a string that you might want to parse later, outside of Data Factory (e.g. in C#, in your backend that reacts on insertions into the sink Dataset)

Then here is how you get it :

  1. Create a dummy column to store the new so-called json string. To do so, use the usual Data Factory technique : Create a "derived column" block in your flow, give a name to the new column (e.g. "asStringDictionary"), then click on "Expression builder" just under the "Expression" field of your dummy column.

enter image description here

  1. In the expression builder, use an expression like this. In essence, use the "reducer" pattern :

    reduce( arrayItems,
    "" , #acc + toString(#item) + ",", concat( char(123), #result , char(125) ) )

Of course, replace "arrayItems" with the name of your own set of items that you got from the flow brick just before the derived Column brick you just added.

Please note that in my case, the following transformation had already been performed in a previous brick :

from
{ "id" : "1", ... ALL OTHER FIELDS ... }
to
{  "1": { ... ALL OTHER FIELDS ... }  }

I did it in yet another derived column, this time with an expression like :

associate(id, @())

Upvotes: 0

Related Questions