Reputation: 10345
I found a How to break a PDF into parts tutorial that demonstrates how to split a PDF file into separate PDF files either by pages or by maximum file size using Adobe Acrobat:
I have found many examples on StackOverflow on how to split a PDF by page with C#. But how can I do the latter? How can I split a PDF file into multiple PDF files by a maximum file size using C#?
For example, say I have a PDF file that is 70 pages and 40 MB. Instead of splitting into 7 PDF files of 10 pages each, how can I split the file into around 5 PDF files that are no greater than 10 MB each using C#?
So far, the best method I have seen was in Using itextsharp to split a pdf into smaller pdf's based on size where Cyfer13 used iTextSharp to split the file by page and then group those page files by size. But is a more direct way to accomplish this without having to first split by page?
Upvotes: 1
Views: 5256
Reputation: 10345
Starting from PDFsharp Sample: Split Document, I wrote the following SplitBySize method:
public static void SplitBySize(string filename, long limit)
{
PdfDocument input = PdfReader.Open(filename, PdfDocumentOpenMode.Import);
PdfDocument output = CreateDocument(input);
string name = Path.GetFileNameWithoutExtension(filename);
string temp = string.Format("{0} - {1}.pdf", name, 0);
int j = 1;
for (int i = 0; i < input.PageCount; i++)
{
PdfPage page = input.Pages[i];
output.AddPage(page);
output.Save(temp);
FileInfo info = new FileInfo(temp);
if (info.Length <= limit)
{
string path = string.Format("{0} - {1}.pdf", name, j);
if (File.Exists(path))
{
File.Delete(path);
}
File.Move(temp, path);
}
else
{
if (output.PageCount > 1)
{
output = CreateDocument(input);
++j;
--i;
}
else
{
throw new Exception(
string.Format("Page #{0} is greater than the document size limit of {1} MB (size = {2})",
i + 1,
limit / 1E6,
info.Length));
}
}
}
}
I will continue to test, but it is working so far.
Upvotes: 3
Reputation: 3663
This is an untested sample code, assuming you are prepared to split at the purely binary level, i.e. the parts won't be read by PDF Reader, and you will have to rejoin the parts to make it readable:
The below code first gets the pdf file in a byte[] array. Then based on an arbitary partition size (5 in this example), gets the file size of each part-binary file. Then, it will create a temporary memory-stream and loop thru to create each partition and writes to a new .part file. (You might need to make some changes to make this workable).
byte[] pdfBytes = File.ReadAllBytes("c:\foo.pdf");
int fileSize = pdfBytes.Length / 5; //assuming foo is 40MB filesize will be abt 8MB
MemoryStream m = new MemoryStream(pdfBytes);
for (int i = 0; i < 4; i++)
{
byte[] tbytes = new byte[fileSize];
m.Read(tbytes,i*fileSize,fileSize);
File.WriteAllBytes("C:\foo" + i + ".part",tbytes);
}
Upvotes: 0