Reputation: 2854
I currently have items in the following structure:
[{
"category" => ["Alcoholic Beverages", "Wine", "Red Wine"],
"name" => "Robertson Merlot",
"barcode" => '123456789-000'
"wine_farm" => "Robertson Wineries",
"price" => 60.00
}]
I have made up this data, but the data I am using is in the same structure and I cannot change the data coming in.
I have > 100 000 of these.
Each product is nested between 1 and n (unlimited) categories.
Because of the tabular nature of this data, the categories are repeated. I want to use tree data to prevent this repetition and cut down the file size by 25 to 30%.
I am aiming at a tree structure something like this:
{
"type" => "category",
"properties" => {
"name" => "Alcoholic Beverages"
},
"children" => [{
"type" => "category",
"properties" => {
"name" => "Wine"
},
"children" => [{
"type" => "category",
"properties" => {
"name" => "Red Wine"
},
"children" => [{
"type" => "product",
"properties" => {
"name" => "Robertson Merlot",
"barcode" => '123456789-000',
"wine_farm" => "Robertson Wineries",
"price" => 60.00
}
}]
}]
}]
}
I can't seem to think of an efficient algorithm to get this right. I would appreciate any help in the right direction.
Should I be generating ID's and ad the parent ID for each node? I am concerned that using ID's will add more length to the text, which I am trying to shorten.
Upvotes: 1
Views: 169
Reputation: 2302
There are probably easier ways of doing this but this is all I can think of for now, it should match your structure.
require 'json'
# Initial set up, it seems the root keys are always the same looking at your structure
products = {
'type' => 'category',
'properties' => {
'name' => 'Alcoholic Beverages'
},
'children' => []
}
data = [{
"category" => ['Alcoholic Beverages', 'Wine', 'Red Wine'],
"name" => 'Robertson Merlot',
"barcode" => '123456789-000',
"wine_farm" => 'Robertson Wineries',
"price" => 60.00
}]
data.each do |item|
# Make sure we set the current to the top-level again
curr = products['children']
# Remove first entry as it's always 'Alcoholic Beverages'
item['category'].shift
item['category'].each do |category|
# Get the index for the category if it exists
index = curr.index {|x| x['type'] == 'category' && x['properties']['name'] == category}
# If it exists then change current hash level to the child of that category
if index
curr = curr[index]['children']
# Else add it in
else
curr << {
'type' => 'category',
'properties' => {
'name' => category
},
'children' => []
}
# We can use last as we know it'll be the last index.
curr = curr.last['children']
end
end
# Delete category from the item itself
item.delete('category')
# Add the item as product type to the last level of the hash
curr << {
'type' => 'product',
'properties' => item
}
end
puts JSON.pretty_generate(products)
Upvotes: 0
Reputation: 13921
Although I have simplified it a bit from your requested structure, you can use the logic to get an idea of how it could be done:
require 'pp'
x = [{
"category" => ["Alcoholic Beverages", "Wine", "Red Wine"],
"name" => "Robertson Merlot",
"barcode" => '123456789-000',
"wine_farm" => "Robertson Wineries",
"price" => 60.00
}]
result = {}
x.each do |entry|
# Save current level in a variable
current_level = result
# We want some special logic for the last item, so let's store that.
item = entry['category'].pop
# For each category, check if it exists, else add a category hash.
entry['category'].each do |category|
unless current_level.has_key?(category)
current_level[category] = {'type' => 'category', 'children' => {}, 'name' => category}
end
current_level = current_level[category]['children'] # Set the new current level of the hash.
end
# Finally add the item:
entry.delete('category')
entry['type'] = 'product'
current_level[item] = entry
end
pp result
And it gives us:
{"Alcoholic Beverages"=>
{"type"=>"category",
"children"=>
{"Wine"=>
{"type"=>"category",
"children"=>
{:"Red Wine"=>
{"name"=>"Robertson Merlot",
"barcode"=>"123456789-000",
"wine_farm"=>"Robertson Wineries",
"price"=>60.0,
"type"=>"product"}},
"name"=>"Wine"}},
"name"=>"Alcoholic Beverages"}}
Upvotes: 1