Reputation: 925
I have implemented adding swf files to pdf using iTextsharp, and my question is, is it possible to do the reverse engineering, for example if I give pdf as input, I have to get swf files from it, if yes how I can do that?
Any idea of how to start, would be greatly appreciated.
Kind Regards,
Raghu.M
Upvotes: 2
Views: 1085
Reputation: 33306
This is a working example that takes this embedded pdf here (first one I found):
http://www.opf-labs.org/format-corpus/pdfCabinetOfHorrors/fileAttachment.pdf
And extracts the embedded files, in this case a KSBASE.WQ2 file.
public static void ExtractAttachments(String src, String dir)
{
PdfReader reader = new PdfReader(string.Format("{0}\\{1}", dir, src));
PdfDictionary root = reader.Catalog;
PdfDictionary names = root.GetAsDict(PdfName.NAMES);
PdfDictionary embedded = names.GetAsDict(PdfName.EMBEDDEDFILES);
PdfArray filespecs = embedded.GetAsArray(PdfName.NAMES);
for (int i = 0; i < filespecs.Size; )
{
ExtractAttachment(reader, dir, filespecs.GetAsString(i++),
filespecs.GetAsDict(i++));
}
}
protected static void ExtractAttachment(PdfReader reader, string dir, PdfString name, PdfDictionary filespec)
{
PRStream stream;
FileStream fos;
String filename;
PdfDictionary refs = filespec.GetAsDict(PdfName.EF);
foreach(PdfName key in refs.Keys) {
stream = (PRStream)PdfReader.GetPdfObject(refs.GetAsIndirectObject(key));
filename = filespec.GetAsString(key).ToString();
// here you can do an filename.Contains(".swf) check
var fileBytes = PdfReader.GetStreamBytes(stream);
File.WriteAllBytes(string.Format("{0}\\{1}", dir, filename), fileBytes);
}
}
You would call this as follows:
var dir = "C:\\temp\\PdfExtract";
ExtractAttachments("fileAttachment.pdf", dir);
You can simply add a filename.Contains(".swf)
check around the file names before extracting.
Update
Ok, this is how I would figure it out if the above approach did not work.
The files must be located in a different place within the catalog, without seeing the file this is how I would approach it.
I would add a breakpoint after root is resolved then step into it to see if I could find where the swf files are.
If you look into root.Keys
you will see what the Catalog
contains.
To retreive any dictionary objects you can use the GetAsDict
method passing in a PdfName
which matches.
Stepping down a level futher you can see that it contains the EmbeddedFiles
and so forth.
There are several PdfName
names, there is even a Flash one.
As the structure of any document can be different it will just be a case of investigating the structure and using the correct parameter's to GetAsDict
in order to read the files.
Upvotes: 2