Reputation: 2402
I am rather new to the whole C# thing and trying to learn it in more practical way to gather more interest and understanding. I have a code that is parsing PDF https://slicedinvoices.com/pdf/wordpress-pdf-invoice-plugin-sample.pdf file and functioning good. However I would like to write to memory instead of console, in order to search for InvoiceNumber from it later.
My current code for writing into console:
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
namespace PDF_file_reader
{
class Program
{
static void Main(string[] args)
{
List<int> InvoiceNumbers = new List<int>();
string filePath = @"C:\temp\parser\Invoice_Template.pdf";
int pagesToScan = 2;
string strText = string.Empty;
try
{
PdfReader reader = new PdfReader(filePath);
for (int page = 1; page <= pagesToScan; page++) //(int page = 1; page <= reader.NumberOfPages; page++) <- for scanning all the pages in A PDF
{
ITextExtractionStrategy its = new LocationTextExtractionStrategy();
strText = PdfTextExtractor.GetTextFromPage(reader, page, its);
strText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(strText)));
//creating the string array and storing the PDF line by line
string[] lines = strText.Split('\n');
foreach (string line in lines)
{
{
//Console.WriteLine($"<{line}>");
Console.WriteLine(line.ToString());
}
}
Console.Read();
}
}
catch (Exception ex)
{
Console.Write(ex);
}
}
}
}
Here is an output in console:
How to write to InvoiceNumbers list instead of Console what I am doing now and perform search out of it? I guess with my current setup search would be not possible?
Upvotes: 0
Views: 615
Reputation: 36
Just a note, you have an extra set of {
}
in your foreach
loop surrounding Console.Writeline()
that you can remove.
If you want to store the whole invoice number as it is highlighted in your screenshot ("INV-3337" instead of just "3337"), InvoiceNumbers
needs to be a list of strings, not ints.
I assume the invoice is always going to be the same, or the number is always going to be the same format (i.e. "Invoice Number 'INV-####"), you could just add a line in your foreach
loop. Since each line
is a string, you can check if line
contains "Invoice Number". If it does, you can add it to InvoiceNumbers
and remove the phrase "Invoice Number". Then trim it to get rid of any whitespace. Either above or below Console.Writeline(line.ToString());
you would just add:
if (line.Contains("Invoice Number"))
InvoiceNumbers.Add(line.Replace("Invoice Number", "").Trim());
(I used Replace()
instead of Remove()
because you would either need to know the start and end positions of the phrase you want to remove. In my opinion, Replace()
is the safest route for this particular situation)
You can add break;
to the if
statement if that's all you're looking for as well. This will stop the foreach
loop. Once you extract the invoice number, there is no reason to look through the rest of the document, unless you have multiple invoices in one document.
if (line.Contains("Invoice Number"))
{
InvoiceNumbers.Add(line.Replace("Invoice Number", "").Trim());
break;
}
If you want to search through the list for a particular invoice number, this answer should help with that.
This is assuming that the only difference would be the actual number. If it's not, you could always look into regular expressions and have it look for a pattern like "INV-\d*". That would also be assuming the invoice number format is always the same.
Upvotes: 1