Nathaniel D. Waggoner
Nathaniel D. Waggoner

Reputation: 2886

find and replace on large kml files which contain html efficiently?

EDIT: These files are posted to a web server I'm working on, I don't have them all at hand right now, only a "representative sample".

I've got large kml files (magnitude 80000 lines), possibly larger, which contain xml, and html which I need to do find a replace on a specific elements of the xml.

namely

<href>some_random_file_name<href>

I need to replace the value present there with a value which I had tried using something similar to this:

http://www.mkyong.com/java/how-to-modify-xml-file-in-java-dom-parser/

But found that the the html caused the parser to bug out and not find the elements I wanted.

Right now I'm iterating over the file line by line and looking for the elements I want, but this is horrendously slow. I need a relatively efficient way to handle this.

Iteration Code:

            File kml = new File(kmlFile);
        FileReader reader = new FileReader(kml);
        BufferedReader br = new BufferedReader(reader);
        String txt="";
        String line = null;
        while((line = br.readLine())!= null) {
            if(line.contains("href")) {
                String tmp = line.replace("<href>","");
                tmp = tmp.replace("</href>","");
                tmp = tmp.replaceAll("\t", "");
                tmp = tmp.replaceAll("images/", "");
                line = "<href>"+namesToIds.get(tmp)+"</href>";
            }
            txt+=line;
        }

        br.close();
        FileWriter writer = new FileWriter(kml);
        BufferedWriter bw = new BufferedWriter(writer);
        bw.write(txt);
        bw.flush();
        bw.close();

I don't think I can put the kml up right now. If it's vital I can try and pull a bunch of stuff out of it to sanitize it for the internet. I think there may be some proprietary things in it.

Upvotes: 0

Views: 406

Answers (1)

meriton
meriton

Reputation: 70574

txt+=line;

The concat operator creates a new string containing the concatenation of the left and right hand side. That involves copying all characters in both operands. For instance, in the 1000th iteration of this loop, it will copy the current contents of txt and the contents of line. That's the first 1001 lines of the file. Put differently, if you have n lines in the file, you will copy a line of text n * (n + 1) / 2 times. Of course, copying the same lines over and over again is not the most efficient way to go about this.

Instead, you should accumulate the converted text in a StringBuilder, or even better, not accumulate in memory, but add each line to the output file as you as you have converted it.

Something like:

try (BufferedReader reader = new BufferedReader(new FileReader(kmlFile))) {
    try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile))) {
        String line = null;
        while((line = br.readLine())!= null) {
            writer.write(convert(line));
            writer.write("\n");
        }
    }
}

Upvotes: 1

Related Questions