Reputation: 17
I am trying to search for a keyword within PDF file using C# and iTextSharp.
So I have come across this piece of code:
public List<int> ReadPdfFile(string fileName, String searthText)
{
List<int> pages = new List<int>();
if (File.Exists(fileName))
{
PdfReader pdfReader = new PdfReader(fileName);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
if (currentPageText.Contains(searthText))
{
pages.Add(page);
}
}
pdfReader.Close();
}
return pages;
}
But it says that PdfReader does not contain the definition for NumberOfPages. Is there any other way I can get number of pages in PDF file?
Upvotes: 1
Views: 1802
Reputation: 96064
The piece of code you found is for iText 5.5.x. iText 7 has a fundamentally changed API, so your NumberOfPages
problem is not the only problem you'll have to deal with.
Nonetheless: To get the number of pages in iText 7, you now use the PdfDocument
method GetNumberOfPages
instead of the former PdfReader
property NumberOfPages
.
And more generally, a port of your method to iText 7 might look like this:
public List<int> ReadPdfFile(string fileName, String searthText)
{
List<int> pages = new List<int>();
if (File.Exists(fileName))
{
using (PdfReader pdfReader = new PdfReader(fileName))
using (PdfDocument pdfDocument = new PdfDocument(pdfReader))
{
for (int page = 1; page <= pdfDocument.GetNumberOfPages(); page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentPageText = PdfTextExtractor.GetTextFromPage(pdfDocument.GetPage(page), strategy);
if (currentPageText.Contains(searthText))
{
pages.Add(page);
}
}
}
}
return pages;
}
Upvotes: 1
Reputation: 2418
You can change this
pdfReader.NumberOfPages
by
getNumberOfPdfPages(fileName)
And the method (reference) :
public int getNumberOfPdfPages(string fileName)
{
using (StreamReader sr = new StreamReader(File.OpenRead(fileName)))
{
Regex regex = new Regex(@"/Type\s*/Page[^s]");
MatchCollection matches = regex.Matches(sr.ReadToEnd());
return matches.Count;
}
}
But it seems weird that the NumberOfPages is not recognized... Are your sure about your using
?
Upvotes: 0