Reputation: 55
I need to be able to separate a large scanned pdf image file which consists of many documents of differing lengths into separate PDF files.
I know one way of doing this is by including a separator page inbetween each document before scanning all the documents in one go. Typically this is done by using a barcode on a separator page which is read and then a new PDF file is created when detected.
I would prefer to be do this in .net but am open to other suggestions. I have had a look on this site at a couple of popular libaries - itextsharp and pdfsharp. I have not been able to find any examples where a PDF file is being split into smaller PDFs of differing number of pages only fixed lengths.
I am not sure it is possible with these libaries, does anyone have any ideas of an alternative or if it is possible?
Upvotes: 1
Views: 2656
Reputation: 179
I am in a same situation and found a solution provided by ByteScout
The sample code after downloading BarCodeReader.dll will be
using System;
using System.IO;
using System.Linq;
using System.Text;
using Bytescout.BarCodeReader;
namespace SplitByBarcode
{
class Program
{
static void Main(string[] args)
{
string inputFile = @"abc.pdf";
Console.WriteLine("Processing file " + inputFile);
using (Reader reader = new Reader())
{
reader.RegistrationName = "demo";
reader.RegistrationKey = "demo";
reader.BarcodeTypesToFind.Code128 = true; // EAN-128 is the same as Code 128
reader.PDFRenderingResolution = 96;
FoundBarcode[] barcodes = reader.ReadFrom(inputFile);
Console.WriteLine("Found " + barcodes.Length + " barcodes");
if (barcodes.Length > 0)
{
StringBuilder pageRanges = new StringBuilder();
// Create string containing page ranges to extract in the form "1-4,6-8,10-11,12-"
for (int i = 0; i < barcodes.Length; i++)
{
FoundBarcode barcode = barcodes[i];
pageRanges.Append(barcode.Page + 2); // +1 because we skip the page with barcode and another +1 because need 1-based page numbers
pageRanges.Append("-");
if (i < barcodes.Length - 1)
{
pageRanges.Append(barcodes[i + 1].Page);
pageRanges.Append(",");
}
}
Console.WriteLine("Extracting page ranges " + pageRanges);
// Split document
string[] splittedParts = reader.SplitDocument(inputFile, pageRanges.ToString());
// Rename parts according to barcode values
for (int i = 0; i < splittedParts.Length; i++)
{
string fileName = barcodes[i].Value + ".pdf";
File.Delete(fileName);
File.Move(splittedParts[i], fileName);
Console.WriteLine("Saved file " + fileName);
}
}
}
Console.WriteLine("Press any key to continue...");
Console.ReadKey();
}
}
}
Hope it will help
Upvotes: 1
Reputation: 77606
It's not exactly clear what you want to do, but this is one way to read a file src
, select page 1-10, and create a file dest
with only those pages:
PdfReader reader = new PdfReader(src);
reader.SelectPages("1-10");
PdfStamper stamper = new PdfStamper(reader, new FileStream(dest, FileMode.Create);
stamper.Close();
An alternative would be to use PdfCopy
. Again you create a reader object:
PdfReader reader = new PdfReader(src);
Now you can use this reader object to create different files, where start
and end
are the page number where you want to start and end.
FileStream fs = new FileStream(dest, FileMode.Create);
using (Document document = new Document()) {
using (PdfCopy copy = new PdfCopy(document, fs)) {
document.Open();
for (int i = start; i < end;) {
copy.AddPage(copy.GetImportedPage(reader, i++));
}
}
}
This is all documented in my book, more specifically in chapter 6 (free download).
As you can choose the range of pages, you can split a document with X pages into Y documents with a different number of pages. Obviously, you have to define the number of pages of each separate document yourself. Libraries such as iTextSharp, PdfSharp, etc... see every scanned page as an image and they don't interpret what's on that page. Introducing a page with a barcode doesn't really make sense. However: if you'd add an annotation on each first page (an annotation is an interactive object in the PDF, not something you add to a page), then iText could split the document based on the places where such an annotation is found.
Upvotes: 0