transilvlad
transilvlad

Reputation: 14532

Extract a file from a ZIP string

I have a BASE64 string of a zip file that contains one single XML file.

Any ideas on how I could get the contents of the XML file without having to deal with files on the disk?

I would like very much to keep the whole process in the memory as the XML only has 1-5k.

It would be annoying to have to write the zip, extract the XML and then load it up and delete everything.

Upvotes: 25

Views: 28228

Answers (9)

Fabien Salles
Fabien Salles

Reputation: 1181

I had the same problem (without base64 encoding) and this works for me :

public function getContent(string $zipContent): string
{
     // create a temporary file in order to put the zip content and open it
     $tmpZipFile = tmpfile();
     fwrite($tmpZipFile, $zipContent);
     $zip = new ZipArchive();
     $zip->open(stream_get_meta_data($tmpZipFile)['uri']);

     // retrieve the first file of the zip archive
     $content = $zip->getFromIndex(0);

     // close the archive and delete the file
     $zip->close();
     fclose($tmpZipFile);

     return $content;
}

Upvotes: 0

MVN_Flugtag
MVN_Flugtag

Reputation: 1

Thanks to @toster-cx for the main idea - I've upgraded it and resolved the trouble with zero $head['csize'].

It can be set in some cases, and there was a stuck: another header with this length is located after the block of zipped contents which length is unknown. Fortunately there is another set of headers called Central Directory - where we can extract all missing data and re-apply the method of @toster-cx.

Also my version features multiple files extraction, putting them into an array with keys = filenames.

https://stackoverflow.com/a/76642785/22194816

Here is the link - please enjoy and distribute the solution )) There are also @see inside which lead to zip specs for better understanding.

Upvotes: 0

9VBlock
9VBlock

Reputation: 1

The idea comes from toster-cx is pretty useful to approach malformed zip files too!

I had one with missing data in the header, so I had to extract the central directory file header by using his method:

$CDFHoffset = strpos( $zipFile, "\x50\x4b\x01\x02" );                                                       
$CDFH = unpack( "Vsig/vverby/vverex/vflag/vmeth/vmodt/vmodd/Vcrc/Vcsize/Vsize/vnamelen/vexlen", substr( $zipFile, $CDFHoffset, 46 ) );

Upvotes: 0

keV
keV

Reputation: 11

If you are running on Linux and have administration of the system. You could mount a small ramdisk using tmpfs, the standard file_get / put and ZipArchive functions will then work, except it does not write to disk, it writes to memory. To have it permanently ready, the fstab is something like:

/media/ramdisk tmpfs nodev,nosuid,noexec,nodiratime,size=2M 0 0

Set your size and location accordingly so it suits you. Using php to mount a ramdisk and remove it after using it (if it even has the privileges) is probably less efficient than just writing to disk, unless you have a massive number of files to process in one go. Although this is not a pure php solution, nor is it portable. You will still need to remove the "files" after use, or have the OS clean up old files. They will of coarse not persist over reboots or remounts of the ramdisk.

Upvotes: 1

Boris
Boris

Reputation: 29

toster-cx had it right,you should award him the points, this is an example where the zip comes from a soap response as a byte array (binary), the content is an XML file:

$objResponse = $objClient->__soapCall("sendBill",array(parameters));
$fileData=unzipByteArray($objResponse->applicationResponse);
header("Content-type: text/xml");
echo $fileData;
function unzipByteArray($data){
  /*this firts is a directory*/
  $head = unpack("Vsig/vver/vflag/vmeth/vmodt/vmodd/Vcrc/Vcsize/Vsize/vnamelen/vexlen", substr($data,0,30));
  $filename = substr($data,30,$head['namelen']);
  $if=30+$head['namelen']+$head['exlen']+$head['csize'];
 /*this second is the actua file*/
  $head = unpack("Vsig/vver/vflag/vmeth/vmodt/vmodd/Vcrc/Vcsize/Vsize/vnamelen/vexlen", substr($data,$if,30));
  $raw = gzinflate(substr($data,$if+$head['namelen']+$head['exlen']+30,$head['csize']));
  /*you can create a loop and continue decompressing more files if the were*/
  return $raw;
}

Upvotes: 2

toster-cx
toster-cx

Reputation: 2387

I had a similar problem, I ended up doing it manually.
https://www.pkware.com/documents/casestudies/APPNOTE.TXT

This extracts a single file (just the first one), no error/crc checks, assumes deflate was used.

// zip in a string
$data = file_get_contents('test.zip');

// magic
$head = unpack("Vsig/vver/vflag/vmeth/vmodt/vmodd/Vcrc/Vcsize/Vsize/vnamelen/vexlen", substr($data,0,30));
$filename = substr($data,30,$head['namelen']);
$raw = gzinflate(substr($data,30+$head['namelen']+$head['exlen'],$head['csize']));

// first file uncompressed and ready to use
file_put_contents($filename,$raw);

Upvotes: 25

Savageman
Savageman

Reputation: 9487

If you know the file name inside the .zip, just do this:

<?php
$xml = file_get_contents('zip://./your-zip.zip#your-file.xml');

If you have a plain string, just do this:

<?php
$xml = file_get_contents('compress.zlib://data://text/plain;base64,'.$base64_encoded_string);

[edit] Documentation is there: http://www.php.net/manual/en/wrappers.php

From the comments: if you don't have a base64 encoded string, you need to urlencode() it before using the data:// wrapper.

<?php
$xml = file_get_contents('compress.zlib://data://text/plain,'.urlencode($text));

[edit 2] Even if you already found a solution with a file, there's a solution (to test) I didn't see in your answer:

<?php
$zip = new ZipArchive;
$zip->open('data::text/plain,'.urlencode($base64_decoded_string));
$zip2 = new ZipArchive;
$zip2->open('data::text/plain;base64,'.urlencode($base64_string));

Upvotes: 1

HenningCash
HenningCash

Reputation: 2120

After some hours of research I think it's surprisingly not possible do handle a zip without a temporary file:

  1. The first try with php://memory will not work, beacuse it's a stream that cannot be read by functions like file_get_contents() or ZipArchive::open(). In the comments is a link to the php-bugtracker for the lack of documentation of this problem.
  2. There is a stream support ZipArchive with ::getStream() but as stated in the manual, it only supports reading operation on an opened file. So you cannot build a archive on-the-fly with that.
  3. The zip:// wrapper is also read-only: Create ZIP file with fopen() wrapper
  4. I also did some attempts with the other php wrappers/protocolls like

     file_get_contents("zip://data://text/plain;base64,{$base64_string}#test.txt")
     $zip->open("php://filter/read=convert.base64-decode/resource={$base64_string}")
     $zip->open("php://filter/read=/resource=php://memory")
    

    but for me they don't work at all, even if there are examples like that in the manual. So you have to swallow the pill and create a temporary file.


Original Answer:

This is just the way of temporary storing. I hope you manage the zip handling and parsing of xml on your own.

Use the php php://memory (doc) wrapper. Be aware, that this is only usefull for small files, because its stored in the memory - obviously. Otherwise use php://temp instead.

<?php

// the decoded content of your zip file
$text = 'base64 _decoded_ zip content';

// this will empty the memory and appen your zip content
$written = file_put_contents('php://memory', $text);

// bytes written to memory
var_dump($written);

// new instance of the ZipArchive
$zip = new ZipArchive;

// success of the archive reading
var_dump(true === $zip->open('php://memory'));

Upvotes: 20

ddjikic
ddjikic

Reputation: 1284

if you want to read the content of a file from zip like and xml inside you shoud look at this i use it to count words from docx (wich is a zip )

if (!function_exists('docx_word_count')) {
    function docx_word_count($filename)
    {
        $zip = new ZipArchive();
        if ($zip->open($filename) === true) {
            if (($index = $zip->locateName('docProps/app.xml')) !== false) {
                $data = $zip->getFromIndex($index);
                $zip->close();
                $xml = new SimpleXMLElement($data);
                return $xml->Words;
            }
            $zip->close();
        }
        return 0;
    }
}

Upvotes: 0

Related Questions