Reputation: 339
I am reading the contents of a zip file and when i find sample.xml file, i edit the contents of it and write to the output zip file
public class CopyEditZip {
static String fileSeparator = System.getProperty("file.separator");
public static void main(String[] args) {
System.getProperty("file.separator");
ZipFile zipFile;
try {
zipFile = new ZipFile("c:/temp/source.zip");
ZipOutputStream zos = new ZipOutputStream(new
FileOutputStream(
c:/temp/target.zip));
for (Enumeration e = zipFile.entries();
e.hasMoreElements();)
{
ZipEntry entryIn = (ZipEntry) e.nextElement();
if (entryIn.getName().contains("sample.xml")) {
zos.putNextEntry(new ZipEntry("sample.xml"));
InputStream is = zipFile.getInputStream(entryIn);
byte[] buf = new byte[1024];
int len;
while ((len = (is.read(buf))) > 0) {
String x = new String(buf);
if (x.contains("Input")) {
System.out.println("edit count");
x = x.replace("Input", "output");
}
buf = x.getBytes();
zos.write(buf, 0, (len < buf.length) ? len
: buf.length);
}
is.close();
zos.closeEntry();
}
zos.close();
zipFile.close();
} catch (Exception ex) {
}
}
}
Now the sample.xml in the output is not coming out correct. There are some data that is truncated and some are lost. Does this have to do with buffer not getting written correctly? Any other alternative to edit the file and write it out?
EDIT: I see the xml is getting written followed some more data from the xml. mt end tag is called broker, then it is followed by few more lines of data. Not sure how it is writing more data after the end tag.
EDIT:
I put a counter and sysout of line by line to see what came out during each iteration of the while loop.
here are the last two lines
18
put.fileFtpDirectory"/><ConfigurableProperty uri="CDTSFileInput#File
Input.fileFtpServer"/><ConfigurableProperty uri="CDTSFileInput#File
Input.fileFtpUser"/><ConfigurableProperty uri="CDTSFileInput#File
Input.longRetryInterval"/><ConfigurableProperty uri="CDTSFileInput#File
Input.messageCodedCharSetIdProperty"/><ConfigurableProperty
uri="CDTSFileInput#File Input.messageEncodingProperty"/>
<ConfigurableProperty uri="CDTSFileInput#File Input.remoteTransferType"/>
<ConfigurableProperty uri="CDTSFileInput#File Input.retryThreshold"/>
<ConfigurableProperty uri="CDTSFileInput#File Input.shortRetryInterval"/>
<ConfigurableProperty uri="CDTSFileInput#File Input.validateMaster"/>
<ConfigurableProperty override="30" uri="CDTSFileInput#File
Input.waitInterval"/><ConfigurableProperty override="no"
uri="CDTSFileInput#FileInput.connectDatasourceBeforeFlowStarts"/>
<ConfigurableProperty uri="CDTSFileInput#FileInput.validateMaster"/>
<ConfigurableProperty override="/apps/cdts/trace/ExceptionTrace-
CDTSFileInput-CDT.REF_EXT.Q01.txt" uri="CDTSFileInp
19
ut#FilePath_ExceptionTrace"/><ConfigurableProperty
override="/apps/cdts/trace/SnapTrace-CDTSFileInput-CDT.REF_EXT.Q01.txt"
uri="CDTSFileInput#FilePath_SnapTraceENV"/><ConfigurableProperty
override="/apps/cdts/trace/SnapTrace-CDTSFileInput-CDT.REF_EXT.Q01.txt"
uri="CDTSFileInput#FilePath_SnapTraceNOENV"/><ConfigurableProperty
override="EXTERNAL" uri="CDTSFileInput#INPUTORIGIN"/>
<ConfigurableProperty
override="/apps/cdts/data_in/data_in_fileinput_gtr1"
uri="CDTSFileInput#InputDirectory"/><ConfigurableProperty override="GTR"
uri="CDTSFileInput#SUBMITTERID"/><ConfigurableProperty
override="FILEINPT" uri="CDTSFileInput#SUBMITTERTYPE"/>
<ConfigurableProperty override="" uri="CDTSFileInput#excludePattern"/>
<ConfigurableProperty override="*" uri="CDTSFileInput#filenamePattern"/>
<ConfigurableProperty override="no"
uri="CDTSFileInput#recursiveDirectories"/></CompiledMessageFlow>
</Broker>ileInput#FileInput.validateMaster"/><ConfigurableProperty
override="/apps/cdts/trace/ExceptionTrace-CDTSFileInput-
CDT.REF_EXT.Q01.txt" uri="CDTSFileInp
the xml end at but part of the last but one line is getting appended again.
Upvotes: 0
Views: 1672
Reputation: 34608
Reading and writing text
If the file is a text file, you shouldn't be reading it as bytes. You should wrap the input stream with a reader, read lines, and write them back to a writer wrapped around the output stream.
One of the reasons for this is that the file could be in an encoding that is not single-byte, like UTF-8. This means that a character can be split between one buffer and the next.
Another problem is that the word Input
might be split between buffers. So you might just get Inp
in one and ut
in the next, and you won't match it properly. Reading lines is a good way of ensuring that you won't be stopping in the middle of a word.
However, it's a little less simple to write text using a ZipOutputStream
, as you don't get a separate output stream for each entry. Therefore, you'll need to extract the bytes from the line you read, and write those to the zip file - much like you did.
Reading and writing bytes
Even if the file happens to be in ASCII, you have a couple of problems in your read/write loops. The first, minor one is that your loop condition should be:
((len = (is.read(buf)) >= 0)
You really should only terminate the loop when you get -1
. In theory, you could get a read in the middle of the loop that didn't read any bytes at all, if the buffer size is zero, but that doesn't mean the stream is ended. So >=
, not >
.
But your worse problem is that you read len
bytes, but you translate the whole buffer to a string. So if you have a buffer of 1024 bytes, and len
is only 50, then only 50 bytes of the buffer will be the content of the latest read, and the rest are going to come from the previous read, or be zero.
So always use exactly len
bytes if that's what you read. You should be using
String x = new String(buf,0,len);
Rather than
String x = new String(buf);
Also, you should note that when you do:
buf = x.getBytes();
Your buffer is no longer 1024 bytes long. If there were originally 1024 bytes, and you have 10 Input
occurrences in your string, the buffer will now be 1034 bytes long (assuming a one-byte encoding). len
is no longer pertinent - it will be smaller than the number. So that's another reason why you have characters that are lost.
Encoding
Usually, XML files are UTF-8. It is important to state the encoding explicitly when you convert bytes to string and vice versa, and also when you create readers and writers. Otherwise, characters may be read inappropriately.
Summary
len
bytes, use len
bytes, not the whole buffer.So a sketch of the new loop would be:
for (Enumeration<? extends ZipEntry> e = zipFile.entries(); e.hasMoreElements();) {
ZipEntry entryIn = e.nextElement();
if (entryIn.getName().contains("sample.xml")) {
zos.putNextEntry(new ZipEntry("sample.xml"));
try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(zipFile.getInputStream(entryIn),
StandardCharsets.UTF_8))) {
String line;
while ((line = bufferedReader.readLine()) != null) {
if (line.contains("Input")) {
System.out.println("edit count");
line = line.replace("Input", "output")
}
line += System.lineSeparator(); // Add newline back.
byte[] buf = line.getBytes(StandardCharsets.UTF_8);
zos.write(buf);
}
}
zos.closeEntry();
}
}
Note:
Enumeration
. Use a proper wildcard and you'll be able to avoid the explicit cast.Upvotes: 2