Steve
Steve

Reputation: 2854

Convert flat list of products and categories to tree structure

I currently have items in the following structure:

[{
    "category" => ["Alcoholic Beverages", "Wine", "Red Wine"],
    "name" => "Robertson Merlot",
    "barcode" => '123456789-000'
    "wine_farm" => "Robertson Wineries",
    "price" => 60.00
}]

I have made up this data, but the data I am using is in the same structure and I cannot change the data coming in.

I have > 100 000 of these.

Each product is nested between 1 and n (unlimited) categories.

Because of the tabular nature of this data, the categories are repeated. I want to use tree data to prevent this repetition and cut down the file size by 25 to 30%.

I am aiming at a tree structure something like this:

{
    "type" => "category",
    "properties" => {
        "name" => "Alcoholic Beverages"
    },
    "children" => [{
                       "type" => "category",
                       "properties" => {
                           "name" => "Wine"
                       },
                       "children" => [{
                                          "type" => "category",
                                          "properties" => {
                                              "name" => "Red Wine"
                                          },
                                          "children" => [{
                                                             "type" => "product",
                                                             "properties" => {
                                                                 "name" => "Robertson Merlot",
                                                                 "barcode" => '123456789-000',
                                                                 "wine_farm" => "Robertson Wineries",
                                                                 "price" => 60.00
                                                             }
                                                         }]

                                      }]
                   }]
}
  1. I can't seem to think of an efficient algorithm to get this right. I would appreciate any help in the right direction.

  2. Should I be generating ID's and ad the parent ID for each node? I am concerned that using ID's will add more length to the text, which I am trying to shorten.

Upvotes: 1

Views: 169

Answers (2)

Nabeel
Nabeel

Reputation: 2302

There are probably easier ways of doing this but this is all I can think of for now, it should match your structure.

require 'json'

# Initial set up, it seems the root keys are always the same looking at your structure
products = {
  'type' => 'category',
  'properties' => {
    'name' => 'Alcoholic Beverages'
  },
  'children' => []
}

data = [{
    "category" => ['Alcoholic Beverages', 'Wine', 'Red Wine'],
    "name" => 'Robertson Merlot',
    "barcode" => '123456789-000',
    "wine_farm" => 'Robertson Wineries',
    "price" => 60.00
}]

data.each do |item|
  # Make sure we set the current to the top-level again
  curr = products['children']

  # Remove first entry as it's always 'Alcoholic Beverages'
  item['category'].shift

  item['category'].each do |category|
    # Get the index for the category if it exists
    index = curr.index {|x| x['type'] == 'category' && x['properties']['name'] == category}

    # If it exists then change current hash level to the child of that category
    if index
      curr = curr[index]['children']

    # Else add it in
    else
      curr << {
        'type' => 'category', 
        'properties' => {
          'name' => category
        },
        'children' => []
      }

      # We can use last as we know it'll be the last index.
      curr = curr.last['children']
    end  
  end

  # Delete category from the item itself
  item.delete('category')

  # Add the item as product type to the last level of the hash
  curr << {
    'type' => 'product',
    'properties' => item
  }
end

puts JSON.pretty_generate(products)

Upvotes: 0

hirolau
hirolau

Reputation: 13921

Although I have simplified it a bit from your requested structure, you can use the logic to get an idea of how it could be done:

require 'pp'
x = [{
    "category" => ["Alcoholic Beverages", "Wine", "Red Wine"],
    "name" => "Robertson Merlot",
    "barcode" => '123456789-000',
    "wine_farm" => "Robertson Wineries",
    "price" => 60.00
}]

result = {}

x.each do |entry|

  # Save current level in a variable
  current_level = result

  # We want some special logic for the last item, so let's store that.
  item = entry['category'].pop


  # For each category, check if it exists, else add a category hash.
  entry['category'].each do |category|
    unless current_level.has_key?(category)
      current_level[category] = {'type' => 'category', 'children' => {}, 'name' => category}
    end
    current_level = current_level[category]['children'] # Set the new current level of the hash.
  end

  # Finally add the item:
  entry.delete('category')
  entry['type'] = 'product'
  current_level[item] = entry

end

pp result

And it gives us:

{"Alcoholic Beverages"=>
  {"type"=>"category",
   "children"=>
    {"Wine"=>
      {"type"=>"category",
       "children"=>
        {:"Red Wine"=>
          {"name"=>"Robertson Merlot",
           "barcode"=>"123456789-000",
           "wine_farm"=>"Robertson Wineries",
           "price"=>60.0,
           "type"=>"product"}},
       "name"=>"Wine"}},
   "name"=>"Alcoholic Beverages"}}

Upvotes: 1

Related Questions