Reputation: 511
I'm trying to load the Google product taxonomy into Firestore documents, which I think primarily means converting it to JSON. This is a sample of the taxonomy:
1 - Animals & Pet Supplies
3237 - Animals & Pet Supplies > Live Animals
2 - Animals & Pet Supplies > Pet Supplies
3 - Animals & Pet Supplies > Pet Supplies > Bird Supplies
7385 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cage Accessories
I can figure out how to deal with the first level of the categories (see code below), but I cannot figure out how to recurse down through the other categories.
const taxonomy = { version: '2019-07-10', categories: []}
for (const line of file) {
// parse the line using -, > as dividers to create array lineItems
const categoryLevel1 = {}
categoryLevel1.id = lineItems[0]
categoryLevel1.name = lineItems[1]
categoryLevel1.categories = []
if (!taxonomy.categories.find(category => category.name === categoryLevel1.name)) {
taxonomy.categories.push(categoryLevel1)
}
}
Upvotes: 1
Views: 556
Reputation: 50787
This question was just resurfaced because of a now-deleted answer. I hadn't seen it before. I'm quite impressed with the excellent answer from customcommander. I like the breakdown of the problem and the simple helper functions.
But I think we can write a simple nest
function that also does what the deepmerge
dependency does, and thus simplify our code.
So here's a similar approach, with such a nest
function.
const splitBy = (sep) => (xs) =>
xs .split (sep) .map (s => s .trim ())
const nest = ([p, ...ps], v, o) =>
p == undefined ? o : {... o, [p] : ps .length == 0 ? v : nest (ps, v, o [p] || {})}
const convert = (lines) => lines
.split ('\n')
.filter (Boolean)
.map (splitBy ('-'))
.map (([id, desc]) => [id, splitBy ('>') (desc)])
.reduce ((a, [id, path]) => nest ([... path, 'id'], id, a), {})
const lines = `
1 - Animals & Pet Supplies
3237 - Animals & Pet Supplies > Live Animals
2 - Animals & Pet Supplies > Pet Supplies
3 - Animals & Pet Supplies > Pet Supplies > Bird Supplies
7385 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cage Accessories`
console .log (convert (lines))
.as-console-wrapper {max-height: 100% !important; top: 0}
This version of nest
is fairly simplistic. It doesn't have the sophistication of deepmerge
or the one I'm more familiar with, Ramda's assocPath
(disclaimer: I'm a Ramda author.) But it is enough for this problem.
Upvotes: 1
Reputation: 18901
I'm trying to load the Google product taxonomy into Firestore documents, which I think primarily means converting it to JSON.
I would suggest a first pass in which nested objects are created.
(You can always reprocess them to achieve the final desired structure.)
{
"Animals & Pet Supplies": {
"id": "1",
"Live Animals": {
"id": "3237"
},
"Pet Supplies": {
"id": "2",
"Bird Supplies": {
"id": "3",
"Bird Cage Accessories": {
"id": "7385"
}
}
}
}
}
Why? It seems to me that these lines can be easily converted into nested objects that you can then merge together.
The following lines:
[ "1 - Animals & Pet Supplies",
"3237 - Animals & Pet Supplies > Live Animals" ]
can be transformed into:
[ {"Animals & Pet Supplies": {"id": 1}}
{"Animals & Pet Supplies": {"Live Animals": {"id": 3237}}}]
and then merged into:
{
"Animals & Pet Supplies": {
"id": 1,
"Live Animals": {
"id": 3237
}
}
}
First let's create two functions that will allow us to get the id and each category
Let's first create a generic curried function:
const splitBy = sep => str =>
str.split(sep).map(x => x.trim());
Because it is curried we can build two specialised functions on top of it:
const splitLine = splitBy('-');
const splitCategories = splitBy('>');
splitLine('1 - Animals & Pet Supplies');
//=> [ '1', 'Animals & Pet Supplies' ]
splitCategories('Animals & Pet Supplies > Live Animals');
//=> [ 'Animals & Pet Supplies', 'Live Animals' ]
Then let's convert each line into a data structure that allows us to create nested objects:
The following lines:
[ "1 - Animals & Pet Supplies",
"3237 - Animals & Pet Supplies > Live Animals" ]
can be converted to pairs where each pair represents an object and a pair can be contained inside another one:
[ ["Animals & Pet Supplies", 1]
["Animals & Pet Supplies", ["Live Animals", 3237]]]
This function will convert a flat array into nested pairs before converting it into an object:
const nest = xs =>
xs.length === 2
? typeof xs[1] === 'string'
? {[xs[0]]: {id: xs[1]}}
: {[xs[0]]: nest(xs[1])}
: nest([xs[0], xs.slice(1)]);
nest(["Animals & Pet Supplies", "Live Animals", 3237]);
// (internally) => ["Animals & Pet Supplies", ["Live Animals", 3237]]
// (final output) => {"Animals & Pet Supplies": {"Live Animals": {"id": 3237}}}
To merge this array of object I'll use deepmerge
. (But you can use anything else as long as it allows deep merging as opposed to shallow merging like you'd get with the spread ...
operator or Object.assign
)
deepmerge.all(
[ {"Animals & Pet Supplies": {"id": 1}}
{"Animals & Pet Supplies": {"Live Animals": {"id": 3237}}}]);
//=> {
//=> "Animals & Pet Supplies": {
//=> "id": "1",
//=> "Live Animals": {
//=> "id": "3237"
//=> }
//=> }
//=> }
Here's a function that will take your lines as an array and return an object of nested categories:
const load = lines =>
// put all lines into a "container"
// we want to process all lines all the time as opposed to each line individually
[lines]
// separate id and categories
// e.g ['3237', 'Animals & Pet Supplies > Live Animals']
.map(lines => lines.map(splitLine))
// split categories and put id last
// e.g. ['Animals & Pet Supplies', 'Live Animals', 3237]
.map(lines => lines.map(([id, cats]) => splitCategories(cats).concat(id)))
// created nested objects
// e.g. {"Animals & Pet Supplies": {"Live Animals": {"id": 3237}}}
.map(lines => lines.map(nest))
// merge all objects into one
.map(lines => deepmerge.all(lines))
// pop the result out of the container
.pop();
load(
[ "1 - Animals & Pet Supplies",
"3237 - Animals & Pet Supplies > Live Animals" ]);
//=> {
//=> "Animals & Pet Supplies": {
//=> "id": "1",
//=> "Live Animals": {
//=> "id": "3237"
//=> }
//=> }
//=> }
const splitBy = sep => str =>
str.split(sep).map(x => x.trim());
const splitLine = splitBy('-');
const splitCategories = splitBy('>');
const nest = xs =>
xs.length === 2
? typeof xs[1] === 'string'
? {[xs[0]]: {id: xs[1]}}
: {[xs[0]]: nest(xs[1])}
: nest([xs[0], xs.slice(1)]);
const load = lines =>
// put all lines into a "container"
// we want to process all lines all the time as opposed to each line individually
[lines]
// separate id and categories
// e.g ['3237', 'Animals & Pet Supplies > Live Animals']
.map(lines => lines.map(splitLine))
// split categories and put id last
// e.g. ['Animals & Pet Supplies', 'Live Animals', 3237]
.map(lines => lines.map(([id, cats]) => splitCategories(cats).concat(id)))
// created nested objects
// e.g. {"Animals & Pet Supplies": {"Live Animals": {"id": 3237}}}
.map(lines => lines.map(nest))
// merge all objects into one
.map(lines => deepmerge.all(lines))
// pop the result out of the container
.pop();
console.log(
JSON.stringify(
load(file_content),
null,
2
)
)
<script src="https://unpkg.com/[email protected]/dist/umd.js"></script>
<script>
const file_content = [
'1 - Animals & Pet Supplies',
'3237 - Animals & Pet Supplies > Live Animals',
'2 - Animals & Pet Supplies > Pet Supplies',
'3 - Animals & Pet Supplies > Pet Supplies > Bird Supplies',
'7385 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cage Accessories',
];
</script>
Now that we have loaded the data into this structure, it may seem that traversing it will be awkward.
const data = {
"Animals & Pet Supplies": {
"id": "1",
"Live Animals": {
"id": "3237"
},
"Pet Supplies": {
"id": "2",
"Bird Supplies": {
"id": "3",
"Bird Cage Accessories": {
"id": "7385"
}
}
}
}
}
It may very well be not the best data structure we could come up with and there's also the option of reprocessing it if we need to.
However thanks to the Iterator protocol, we can now shape our data without thinking (too much) about how it will be accessed.
Making our data "iterable" as per the Iterator protocol is easy and allows us to use JavaScript constructs such as the ...
spread operator or the for...of
loop:
const iterate = o => (
{ ...o
, [Symbol.iterator]() {
const entries = Object.entries(o).filter(([k, v]) => k !== 'id');
return {
next() {
if (entries.length === 0) return {done: true};
const [name, {id}] = entries.pop();
return {done: false, value: {id, name}};
}
};
}
}
);
In this implementation, we'll return an object {id, name}
at each iteration.
Let's access the first level:
for (let obj of iterate(data)) {
console.log(obj)
}
//=> { id: '1', name: 'Animals & Pet Supplies' }
Let's access the second level:
for (let obj of iterate(data['Animals & Pet Supplies'])) {
console.log(obj)
}
// { id: '2', name: 'Pet Supplies' }
// { id: '3237', name: 'Live Animals' }
Or we can use the ...
spread operator to store directly into an array:
const level2 = [...iterate(data['Animals & Pet Supplies'])];
// [ { id: '2', name: 'Pet Supplies' }
// { id: '3237', name: 'Live Animals' } ]
Upvotes: 1