Reputation: 161
I am trying to extract images from a PDF using the following code. It works well for some filters like DCTDecode , but is not working for JPXDEcode ."Parameter not valid " error occurs at the point image.GetDrawingImage() is called.
using System.Drawing.Imaging;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
...
PdfReader pdf = new PdfReader(currfilename);
PdfReaderContentParser parser = new PdfReaderContentParser(pdf);
ImageRender listener = new ImageRender();
for (int i = 1; i <= pdf.NumberOfPages; i++)
{
try
{
parser.ProcessContent(i, listener);//calls RenderImage() at this point
}
catch (Exception e)
{
Console.WriteLine(e);
}
}
public void RenderImage(ImageRenderInfo renderInfo)
{
PdfImageObject image = renderInfo.GetImage();
PdfName filter = image.Get(PdfName.FILTER) as PdfName;
if (renderInfo.GetRef() != null && image != null)
{
using (System.Drawing.Image dotnetImg = image.GetDrawingImage())//exception occurs at this point
{
if (dotnetImg != null)
{
ImageNames.Add(string.Format("{0}.tiff", renderInfo.GetRef().Number));
using (MemoryStream ms = new MemoryStream())
{
dotnetImg.Save(ms, ImageFormat.Tiff);
Images.Add(ms.ToArray());
}
}
}
}
}
I tried these links for a solution
Extract images using iTextSharp
Extract Image from a particular page in PDF
and was able to extract the raw image bytes using PdfReader.GetStreamBytesRaw() function but "Parameter not valid "exception always occurs at the point where System.Drawing.Image.FromStream(memory stream) is called.
I also checked this link "Parameter is not valid" exception from System.Drawing.Image.FromStream() method , but could not find anything helpful.
Please help
Upvotes: 3
Views: 2241
Reputation: 161
Using FreeImage.dll solved the problem. The code is as follows
using FreeImageAPI;
using System.Drawing.Imaging;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
...
imagecount = 0;
PdfReader pdf = new PdfReader(currfilename);
PdfReaderContentParser parser = new PdfReaderContentParser(pdf);
ImageRender listener = new ImageRender();
for (int i = 1; i <= pdf.NumberOfPages; i++)
{
try
{
parser.ProcessContent(i, listener);//calls RenderImage() at this point
}
catch (Exception e)
{
Console.WriteLine(e);
}
}
if (listener.Images.Count > 0)
{
for (int j = 0; (j < listener.Images.Count); ++j)
{
string imgpath = Environment.CurrentDirectory.ToString() + "\\Image" + imagecount + ".bmp";
// create a memory stream
MemoryStream imageStream = new MemoryStream(listener.Images[j]);
// create a FIBITMAP from that stream
FIBITMAP dib = FreeImage.LoadFromStream(imageStream);
if (dib.IsNull) continue;
//turn it into a normal Bitmap
Bitmap bitmap = FreeImage.GetBitmap(dib);
bitmap.Save(imgpath);
//unload the FIBITMAP
FreeImage.UnloadEx(ref dib);
bitmap.Dispose();
System.Drawing.Image img = System.Drawing.Image.FromFile(imgpath);
}
public void RenderImage(ImageRenderInfo renderInfo)
{
PdfImageObject image = renderInfo.GetImage();
if (renderInfo.GetRef() != null && image != null)
{
byte[] tempImage = image.GetImageAsBytes();
ImageNames.Add(string.Format("0}.bmp",renderInfo.GetRef().Number));
Images.Add(tempImage);
}
}
I followed the instructions given here to add FreeImage .Net to solution
Upvotes: 3
Reputation: 10418
The JPXDecode filter corresponds to JPEG 2000 compression, which is not supported by .net framework. This other question in SO may help: JPEG 2000 support in C#.NET
Upvotes: 3