Reputation: 2852
I need to compare two PDF files for equality. The two files need to be identical in content, and I'm not having any success with the proposals found on:
https://stackoverflow.com/a/36108862/2807741
public static bool AreFileContentsEqual(String path1, String path2) =>
File.ReadAllBytes(path1).SequenceEqual(File.ReadAllBytes(path2));
and
https://stackoverflow.com/a/76917554/2807741
private bool AreFilesEqual(string file1Path, string file2Path)
{
string file1Hash = "", file2Hash = "";
SHA1 sha = new SHA1CryptoServiceProvider();
using (FileStream fs = System.IO.File.OpenRead(file1Path))
{
byte[] hash;
hash = sha.ComputeHash(fs);
file1Hash = Convert.ToBase64String(hash);
}
using (FileStream fs = System.IO.File.OpenRead(file2Path))
{
byte[] hash;
hash = sha.ComputeHash(fs);
file2Hash = Convert.ToBase64String(hash);
}
return (file1Hash == file2Hash);
}
(among other links I've tried).
I'm comparing two "identical" files and they're always returning false (unless I compare a file with itself, only case where it returns true).
The way I created the files to compare is the next:
Maybe something is changing in the second file when saving even I'm not making any modifications to it?
file1.pdf:
file2.pdf
Edit 1:
When I say "Identical" I mean identical in content. The PDFs will contain amounts (numbers), and those amounts in the PDF bills must be exactly the same.
Upvotes: 0
Views: 192
Reputation: 2852
Ok, I'll answer myself. iText7 is the way to go, as it can read PDF files content as text.
Nuget package: https://www.nuget.org/packages/itext7
public IActionResult Index()
{
var exeFilePath = System.Reflection.Assembly.GetExecutingAssembly().Location;
var workPath = $"{Path.GetDirectoryName(exeFilePath)}\\Assets";
var file1 = $"{workPath}\\testpdfv1.pdf";
var file2a = $"{workPath}\\testpdfv2equalv1.pdf";
var file2b = $"{workPath}\\testpdfv2differentv1.pdf";
var fileContents1 = PdfToText(file1);
var fileContents2 = PdfToText(file2a);
var filesAreEqual = fileContents1 == fileContents2;
return View();
}
private string PdfToText(string pPdfFileInfo)
{
var pdfFileInfo = new FileInfo(pPdfFileInfo);
var pdfDocument = new PdfDocument(new PdfReader(pdfFileInfo.FullName));
var strategy = new LocationTextExtractionStrategy();
var result = "";
for (int i = 1; i <= pdfDocument.GetNumberOfPages(); ++i)
{
var page = pdfDocument.GetPage(i);
string text = PdfTextExtractor.GetTextFromPage(page, strategy);
result += text;
}
pdfDocument.Close();
return result;
}
Upvotes: 0