javaalgorithmdesign-patternsexportrelationship

Reputation: 8043

Refactoring nested for loops

I have this situation where I have a parent child relationship between two sets of data. I have a parent document collection and a child document collection. The requirement is that the parents and their corresponding children need to be exported into 'a' pdf document. A simple-implementation of the above situation can be as follows(java-ish pseudo code below):

for(Document parentDocument:Documents){
   ExportToPdf(parentDocument);
    for(Document childDocument:parentDocument.children()){
      AppendToParentPdf(childDocument);  
  }
}

Something as above will probably solve the problem, but all of a sudden the requirements changes and now each of these parents and their corresponding children need to be in separate pdfs, so the above given snippet is modified by changing the AppendToParentPdf() to ExportToPdf() follows:

for(Document parentDocument:Documents){
   ExportToPdf(parentDocument);
    for(Document childDocument:parentDocument.children()){
      ExportToPdf(childDocument);  
  }
}

Going along this way, it will not take long before this seemingly trivial snippet would suffer from some serious code smells.

My questions to SO are:

Are there better representations of parent-child relationships such as the above where instead of brute-forcing my way through all the documents and their children in an O(n^2) fashion, I can use a different data-structure or technique to traverse the entire structure in a more optimal fashion.
In the scenario that I described above, where the business rules are rather fluid about the way the pdfs should be exported, is there a smarter way to code the nature of the export function? Also the export format is transient. PDFs can give way to *.docs/csvs/xmls et al.

It will be great to get some perspective on this.

Thanks

Upvotes: 9

Answers (6)

Alderath

Reputation: 3874

Using a Set to keep track of which elements have already been exported might not be the most beautiful solution, but it will prevent the documents from being exported twice.

Set<Document> alreadyExported = new HashSet<Document>();

for(Document parentDocument:Documents){
   ExportToPdf(parentDocument);
   for(Document childDocument:parentDocument.children()){
      if(!aldreadyExported.contains(childDocument)){
         ExportToPdf(childDocument);
         alreadyExported.add(childDocument);
      }  
   }
}

Upvotes: 1

Stephen C

Reputation: 719346

Are there better representations of parent-child relationships such as the above where instead of brute-forcing my way through all the documents and their children in an O(n^2) fashion.

This is not O(N^2). It is O(N) where N is the total number of parent and child documents. Assuming that no child has more than one parent document, then you can't significantly improve the performance. Furthermore, the cost of the traversal is probably trivial compared with the cost of the calls that generate the PDFs.

The only case where you might want to consider optimizing is if child documents can be children of multiple parents. In that case, you may want to keep track of the documents that you've already generated PDF's for ... and skip them if you revisit them in the traversal. The test for "have I seen this document before" can be implemented using a HashSet.

Upvotes: 4

armandino

Reputation: 18562

You could encapsulate what you want to do with a document in a handler. This will also allow you to define new handlers in the future that you can pass to existing code.

interface DocumentHandler {
    void process(Document d);
}

class ExportToPdf implements DocumentHandler { ... }
class AppendToParentPdf implements DocumentHandler { ... }

// Now you're just passing the interface whose implementation does something with the document
void handleDocument(DocumentHandler parentHandler, DocumentHandler childHandler) {
    for(Document parent : documents) {
        parentHandler.process(parent);

        for(Document child : parent.children()) {
            childHandler.process(child);
        }
    }
}

DocumentHandler appendToParent = new AppendToParentPdf();
DocumentHandler exportToPdf = new ExportToPdf();

// pass the child/parent handlers as needed
handleDocument(exportToPdf, appendToParent);
handleDocument(exportToPdf, exportToPdf);

As for efficiency, I'd say don't try to optimise unless you run into performance issues. In any case, the problem won't be with the nested loop but with the logic itself that processes the documents.

Upvotes: 3

Santhosh

Reputation: 6707

For your 2nd question, you could use the provider pattern or an extension of it.

Provider pattern : This pattern has its roots in the Strategy pattern and it lets you design your data and behavior in an abstraction so that you can swap out implementation at any time

Upvotes: 2

Varun Achar

Reputation: 15109

The second questions' problem can be solved by simply creating an inteface Exporter with the method export(Document doc); and then implementing it for each of the various formats, e.g. class DocExporterImpl implements Exporter.

The first one is dependent on your requirements and no design pattern as such solves these problems. Can't help you there.

Upvotes: 1

BillRobertson42

Reputation: 12883

I tried to fit this into a comment, but there's too much to say...

I don't see how the change you're talking about is a code smell. If the requirements change for this simple function, then they change. If you only need to make the change in one place, then it sounds like you've done a good job. If your client is going to need both ways of doing it (or more), then you might consider some sort of strategy pattern so you don't have to rewrite the surrounding code to do either function.

If you're making dozens of these changes per week, then it could get messy and you probably ought to make a plan for how to more effectively deal with a very busy axis of change. Otherwise, discipline and refactoring can help you keep it clean.

As to whether or not n² is a problem, it depends. How big is n? If you have to do this frequently (i.e. dozens of times per hour) and n is in the 1000's of them, then you might have a problem. Otherwise I wouldn't sweat it as long as you're meeting or exceeding demand and your CPU/disk utilization is out of the danger zone.

Upvotes: 1

Refactoring nested for loops

Answers (6)

Related Questions