md1980
md1980

Reputation: 339

contents of zip file getting written incorrectly

I am reading the contents of a zip file and when i find sample.xml file, i edit the contents of it and write to the output zip file

public class CopyEditZip {

static String fileSeparator = System.getProperty("file.separator");

    public static void main(String[] args) {
        System.getProperty("file.separator");

        ZipFile zipFile;
        try {
             zipFile = new ZipFile("c:/temp/source.zip");
             ZipOutputStream zos = new ZipOutputStream(new 
                                   FileOutputStream(
                                          c:/temp/target.zip));

             for (Enumeration e = zipFile.entries(); 
                        e.hasMoreElements();) 
                 {
                    ZipEntry entryIn = (ZipEntry) e.nextElement();
                    if (entryIn.getName().contains("sample.xml")) {
                        zos.putNextEntry(new ZipEntry("sample.xml"));
                        InputStream is = zipFile.getInputStream(entryIn);
                        byte[] buf = new byte[1024];
                        int len;
                        while ((len = (is.read(buf))) > 0) {
                            String x = new String(buf);
                            if (x.contains("Input")) {
                               System.out.println("edit count");
                                x = x.replace("Input", "output");
                            }
                            buf = x.getBytes();
                            zos.write(buf, 0, (len < buf.length) ? len
                                : buf.length);
                        }
                        is.close();
                        zos.closeEntry();
                 }
                zos.close();
                zipFile.close();
          } catch (Exception ex) {
        
        }

       }
     }

Now the sample.xml in the output is not coming out correct. There are some data that is truncated and some are lost. Does this have to do with buffer not getting written correctly? Any other alternative to edit the file and write it out?

EDIT: I see the xml is getting written followed some more data from the xml. mt end tag is called broker, then it is followed by few more lines of data. Not sure how it is writing more data after the end tag.

EDIT:

I put a counter and sysout of line by line to see what came out during each iteration of the while loop.

here are the last two lines

18
put.fileFtpDirectory"/><ConfigurableProperty uri="CDTSFileInput#File     
Input.fileFtpServer"/><ConfigurableProperty uri="CDTSFileInput#File     
Input.fileFtpUser"/><ConfigurableProperty uri="CDTSFileInput#File 
Input.longRetryInterval"/><ConfigurableProperty uri="CDTSFileInput#File 
Input.messageCodedCharSetIdProperty"/><ConfigurableProperty 
uri="CDTSFileInput#File Input.messageEncodingProperty"/>
<ConfigurableProperty uri="CDTSFileInput#File Input.remoteTransferType"/>
<ConfigurableProperty uri="CDTSFileInput#File Input.retryThreshold"/>
<ConfigurableProperty uri="CDTSFileInput#File Input.shortRetryInterval"/>
<ConfigurableProperty uri="CDTSFileInput#File Input.validateMaster"/>
<ConfigurableProperty override="30" uri="CDTSFileInput#File 
Input.waitInterval"/><ConfigurableProperty override="no" 
uri="CDTSFileInput#FileInput.connectDatasourceBeforeFlowStarts"/>
<ConfigurableProperty uri="CDTSFileInput#FileInput.validateMaster"/>
<ConfigurableProperty override="/apps/cdts/trace/ExceptionTrace-
CDTSFileInput-CDT.REF_EXT.Q01.txt" uri="CDTSFileInp

19
ut#FilePath_ExceptionTrace"/><ConfigurableProperty   
override="/apps/cdts/trace/SnapTrace-CDTSFileInput-CDT.REF_EXT.Q01.txt"    
uri="CDTSFileInput#FilePath_SnapTraceENV"/><ConfigurableProperty    
override="/apps/cdts/trace/SnapTrace-CDTSFileInput-CDT.REF_EXT.Q01.txt" 
uri="CDTSFileInput#FilePath_SnapTraceNOENV"/><ConfigurableProperty   
override="EXTERNAL" uri="CDTSFileInput#INPUTORIGIN"/>
<ConfigurableProperty 
override="/apps/cdts/data_in/data_in_fileinput_gtr1" 
uri="CDTSFileInput#InputDirectory"/><ConfigurableProperty override="GTR" 
uri="CDTSFileInput#SUBMITTERID"/><ConfigurableProperty 
override="FILEINPT" uri="CDTSFileInput#SUBMITTERTYPE"/>
<ConfigurableProperty override="" uri="CDTSFileInput#excludePattern"/>
<ConfigurableProperty override="*" uri="CDTSFileInput#filenamePattern"/>
<ConfigurableProperty override="no" 
uri="CDTSFileInput#recursiveDirectories"/></CompiledMessageFlow>
</Broker>ileInput#FileInput.validateMaster"/><ConfigurableProperty 
override="/apps/cdts/trace/ExceptionTrace-CDTSFileInput-
CDT.REF_EXT.Q01.txt" uri="CDTSFileInp

the xml end at but part of the last but one line is getting appended again.

Upvotes: 0

Views: 1672

Answers (1)

RealSkeptic
RealSkeptic

Reputation: 34608

Reading and writing text

If the file is a text file, you shouldn't be reading it as bytes. You should wrap the input stream with a reader, read lines, and write them back to a writer wrapped around the output stream.

One of the reasons for this is that the file could be in an encoding that is not single-byte, like UTF-8. This means that a character can be split between one buffer and the next.

Another problem is that the word Input might be split between buffers. So you might just get Inp in one and ut in the next, and you won't match it properly. Reading lines is a good way of ensuring that you won't be stopping in the middle of a word.

However, it's a little less simple to write text using a ZipOutputStream, as you don't get a separate output stream for each entry. Therefore, you'll need to extract the bytes from the line you read, and write those to the zip file - much like you did.

Reading and writing bytes

Even if the file happens to be in ASCII, you have a couple of problems in your read/write loops. The first, minor one is that your loop condition should be:

((len = (is.read(buf)) >= 0)

You really should only terminate the loop when you get -1. In theory, you could get a read in the middle of the loop that didn't read any bytes at all, if the buffer size is zero, but that doesn't mean the stream is ended. So >=, not >.

But your worse problem is that you read len bytes, but you translate the whole buffer to a string. So if you have a buffer of 1024 bytes, and len is only 50, then only 50 bytes of the buffer will be the content of the latest read, and the rest are going to come from the previous read, or be zero.

So always use exactly len bytes if that's what you read. You should be using

String x = new String(buf,0,len);

Rather than

String x = new String(buf);

Also, you should note that when you do:

buf = x.getBytes();

Your buffer is no longer 1024 bytes long. If there were originally 1024 bytes, and you have 10 Input occurrences in your string, the buffer will now be 1034 bytes long (assuming a one-byte encoding). len is no longer pertinent - it will be smaller than the number. So that's another reason why you have characters that are lost.

Encoding

Usually, XML files are UTF-8. It is important to state the encoding explicitly when you convert bytes to string and vice versa, and also when you create readers and writers. Otherwise, characters may be read inappropriately.

Summary

  • Prefer a line-based read loop for a text file.
  • If you read bytes rather than lines: if you read len bytes, use len bytes, not the whole buffer.
  • If you change the data, don't use the old len.
  • Use encodings.

So a sketch of the new loop would be:

for (Enumeration<? extends ZipEntry> e = zipFile.entries(); e.hasMoreElements();) {
    ZipEntry entryIn = e.nextElement();
    if (entryIn.getName().contains("sample.xml")) {
        zos.putNextEntry(new ZipEntry("sample.xml"));
        try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(zipFile.getInputStream(entryIn),
                                                                                      StandardCharsets.UTF_8))) {
            String line;
            while ((line = bufferedReader.readLine()) != null) {
                if (line.contains("Input")) {
                    System.out.println("edit count");
                    line = line.replace("Input", "output")

                }
                line += System.lineSeparator(); // Add newline back.
                byte[] buf = line.getBytes(StandardCharsets.UTF_8);
                zos.write(buf);
            }
        }
     zos.closeEntry();
    }
}

Note:

  • Try-with-resources for opening the buffered reader. It will be closed automatically (with its underlying reader and input steam).
  • Don't use the raw type Enumeration. Use a proper wildcard and you'll be able to avoid the explicit cast.
  • Since you create a buffer from the full line, and only that line, you can write that full buffer and don't need offset and length.

Upvotes: 2

Related Questions