Greg Lovern
Greg Lovern

Reputation: 977

Not seeing PDF fields from Excel VBA

I've been tasked with creating an Excel tool to import data from thousands of PDFs into Excel. I've installed Adobe Acrobat X Professional, and it seems to be working fine. I'm using Excel 2007.

I found sample code here: http://khkonsulting.com/2010/09/reading-pdf-form-fields-with-vba/

And modifying it only slightly to prompt me to browse to the PDF file, and setting a reference to "Adobe Acrobat 10.0 Type Library", I'm trying this code:

Sub Main()
Dim AcroApp As Acrobat.CAcroApp
Dim theForm As Acrobat.CAcroPDDoc
Dim jso As Object
Dim text1, text2 As String

Set AcroApp = CreateObject("AcroExch.App")
Set theForm = CreateObject("AcroExch.PDDoc")

Dim strFullPath As String
strFullPath = Application.GetOpenFilename()
theForm.Open (strFullPath) 'theForm.Open ("C:\temp\sampleForm.pdf")

Set jso = theForm.GetJSObject

' get the information from the form fields Text1 and Text2
text1 = jso.getfield("MFR_ctrl33605579").Value 'jso.getfield("Text1").Value
'(etc)
End Sub

("MFR_ctrl33605579" is the name of one of the text fields; I found that in the editor that is part of or included with Acrobat Pro, called "Adobe LiveCycle Designer".)

It runs without error until it gets to the getfield method. Then I get the error "Object required".

If I run TypeName on AcroApp, theForm, and jso, I get CAcroApp, CAcroPDDoc, and object respectively:

?typename(AcroApp)
CAcroApp
?typename(theForm)
CAcroPDDoc
?typename(jso)
Object

If I put them in the Watches window, I initially get plus signs to the left of all three of them, but if I click on those plus signs, only one line appears under each one, saying "< No Variables >" in the Value column.

I wondered if the document was somehow locked against being read in this way. I read that if it was locked, a padlock would appear at the upper left in Acrobat, and it would not allow saving as text. But no padlock appears that I see, and it does allow saving as plain text. So as far as I can tell it doesn't seem to be locked.

Any suggestions on what I can try?

UPDATE:

I'm just getting started trying the suggestions, but I wanted to note that after trying rheitzman's For loop to get the field name (using getNthFieldName), I see that my code works if I use this for the field name:

form1[0].QuestionnaireForm[0].sbfrmProfile[0].sbfrmContact[0].sbfrmManufacturerDetails[0].MFR_ctrl33605579[0]

Or in other words:

text1 = jso.getfield("MFR_form1[0].QuestionnaireForm[0].sbfrmProfile[0].sbfrmContact[0].sbfrmManufacturerDetails[0].MFR_ctrl33605579[0]").Value

That would allow me to identify fields by that very long, apparently fully-qualified reference, which would get me through the project. But first I'm going to check the other ideas to see if the fields can be found by short name only.

UPDATE 2:

I see now that I can inspect some details of some of the object model in the Object Browser (filter Libraries on Acrobat), even though it doesn't show up in the Watch window.

However, it doesn't show objects created by methods, such as the object created by the AcroExch.PDDoc.GetJSObject method (see AcroPDDoc in the object model). The method of that object used here, getNthFieldName, does not appear in the object browser at all.

And, I don't see any other way there of identifying the fields by their short field names.

So ... for this project, I'm just going to use the long field names returned by getNthFieldName.

Upvotes: 1

Views: 6199

Answers (2)

Parfait
Parfait

Reputation: 107587

Not quite intuitive in the Adobe Acrobat OLE documentation but connsider using the AvDoc (UI) object as a "middle man" between document and app (Form < UI < App). Essentially, the AvDoc is the user interface object that the user interacts with while the PDDoc is the background object the user does not see but Adobe Acrobat maintains access to.

Notice too I open the document with formDoc, a different PDDoc, than the one used for JSObject as formDoc inherits from the AVDoc object. Again, this automation is not well documented as Adobe leans more to Javascript than VBA, so this is mostly learned through trial and error.

Sub Main()
    Dim AcroApp As Acrobat.CAcroApp
    Dim formUI As Acrobat.CAcroAVDoc
    Dim srcDoc As Acrobat.CAcroPDDoc, formDoc As Acrobat.CAcroPDDoc
    Dim jso As Object
    Dim strFullPath As String, text1 As String, text2 As String

    Set AcroApp = CreateObject("AcroExch.App")
    Set formUI = CreateObject("AcroExch.AVDoc")
    Set srcDoc = CreateObject("AcroExch.PDDoc")

    strFullPath = Application.GetOpenFilename()
    srcDoc.Open (strFullPath)

    If formUI.Open(strFullPath, "") = True Then
        Set formDoc = formUI.GetPDDoc()        
        Set jso = formDoc.GetJSObject()

        ' EXTRACT FORM FIELDS' TEXT
         text1 = jso.GetField("MFR_ctrl33605579").Value    
    End If

   ' UNINTIALIZING PDF OBJECTS
    Set jso = Nothing
    Set formDoc = Nothing         
    Set srcDoc = Nothing 
    Set formUI = Nothing
    Set AcroApp = Nothing

End Sub

Upvotes: 0

rheitzman
rheitzman

Reputation: 2297

iTextSharp is much easier to use!

That said, here's a snippet to read all fields.

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    Dim pdfForm As Acrobat.AcroPDDoc
    Dim jso As Object ' ?
    Dim s As String = ""
    Dim i As Integer
    Dim strFullPath As String = "H:\OIS\ENFORCE\OPEN_BURN\Ag\temp\Open_Burn_Template_Out.pdf"
    Try
        pdfForm = New Acrobat.AcroPDDoc
        pdfForm.Open(strFullPath)
        jso = pdfForm.GetJSObject
        For i = 0 To jso.numfields() - 1
            s = jso.getNthFieldName(i)
            Debug.Print(s & ": " & jso.getField(s).value)
        Next
    Catch ex As Exception
        MsgBox(ex.Message)
    End Try
End Sub

It could be your "form" doesn't have any fields (or a flattened form), or the names are off.

The JSObject properties are a bit of a mystery. I found a post that used reflection to extract data but it just so happens that the JSObject properties are usable if you know what they are! e.g. numfields, getNthFieldName

If anyone finds a reference link for JSObject please post in comments.

Upvotes: 2

Related Questions