How to parse PDF content to database with powershell

Question

I have a pdf document that I would like to extract content out of. The issue I am having is this... I search for the IMEI keyword, and it finds it, but I need the actual IMEI value which is the next item in the loop.

In the PDF the value looks like this: IMEI 90289393092

returning value via the below script: -0.1 -8.8 9.8 -0.1 446.7 403.9 Tm (IMEI:) Tj

I only want to have the value: 90289393092

Script I am using:

Add-Type -Path .\itextsharp.dll
$reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList "$pwd\PDF\DOC001.pdf"

for ($page = 1; $page -le $reader.NumberOfPages; $page++) {
 $lines = [char[]]$reader.GetPageContent($page) -join "" -split "`n"
 foreach ($line in $lines) {
  if ($line -match "IMEI") { 
   $line = $line -replace "$[\S])", $matches[1]
   $line -replace "^\[\(|$\]TJ$", "" -split "\)\-?\d+\.?\d*\(" -join ""

  }
 }
}

How to parse PDF content to database with powershell

Answers (1)

Related Questions