Reputation: 830
I've been thinking about this problem all day and wanted to get some other opinions on the best way approach this.
I recently got presented with this problem involving matching similar bill of materials (BOM). I've been trying to think of how this can be best done programmatically.
Assume there is an assembled part (assembly) which is composed of multiple component parts. For simplicity we'll assume the assembly only goes down one level. Lets say I want to make a function that returns all similar assemblies based on the similarity of the bill of materials. I would assume that a similarity rank of 1 means the BOM is identical, the assembly is made of the exact same number of component items. On the other hand, a rank of 0 means there are no similarities. The rank would be a range between 1 and 0 based on the assembly similarity. I would rank similarity on not only same part number but also same quantity of parts, but for simplicity I can ignore quantity for now.
How would you approach this? I work with SQL, but I am also just curious from a high level algorithm standpoint.
Upvotes: 0
Views: 917
Reputation: 20550
One well-known metric is https://en.wikipedia.org/wiki/Jaccard_index . Consider applying it to this domain.
Then one could view similarity in several ways, the simplest being a comparison of the set of components in BOM1 vs set of components in BOM2, ignoring component counts. So the similarity is one for identical sets, and decreases for each item that is added to just one of the sets. The Jaccard similarity divides size of intersection by size of union.
Now put component counts back in, so 2 bolts plus 3 nuts in BOM1 would give a size of 5. Again you could define a similarity, this time looking at intersections and unions of multisets. In your domain, it's possible that you would want to weight this differently, so that adding first unit of a novel part to a BOM produces a large change, and adding second unit of that part produces a much smaller change.
Upvotes: 2