Clanrat
Clanrat

Reputation: 43

Can't open file from list of file names when index exceeds a certain value when using matlab

I'm trying to extract .xml files from a .zip containing 60000+ .xml files without having to actually extract the archive. Each .xml file has the following naming format HMDB#.xml with a 5 digit number replacing the #.

Each .xml file is around 25kb in size +-5kb

I am using the following code to do this at the moment. path is a string containing the .zip file directory and hmdbid is a string containing the 5-digit number:

%// Opens the zip file and creates temporary directories for the files so data
%// can be extracted.

function data=partzip(path,hmdbid)

    zipFilename = path;
    zipJavaFile = java.io.File(zipFilename);
    zipFile=org.apache.tools.zip.ZipFile(zipJavaFile);
    entries=zipFile.getEntries;
    cnt=1;

    while entries.hasMoreElements
        tempObj=entries.nextElement;
        file{cnt,1}=tempObj.getName.toCharArray';
        cnt=cnt+1;
    end

    ind=regexp(file,sprintf('$*%s.xml$',hmdbid));
    ind=find(~cellfun(@isempty,ind));
    file=file(ind);
    file = cellfun(@(x) fullfile('.',x),file,'UniformOutput',false);

    data=extract_data(file{1});
    zipFile.close;
end

When testing the code with a .zip file containing:

The code works fine when hmdbid is 00002,00005 or 00008 when it exceeds this my data extraction function returns a file not found error.

I have tried several combinations of files with different file names withe the same result. The first 3 files work fine but the others don't, regardless the name of the file.

I have tried creating a .zip containing 100 test .xml files containing only it's file name and extracting from these work fine which leads me to believe it's a memory issue, but I'm not sure how to fix it.

Upvotes: 0

Views: 80

Answers (0)

Related Questions