finnjgr
finnjgr

Reputation: 457

Access PDF Files using the Gemini API

I am trying to extract information from a pdf using the Gemini API (1.5-pro). Right now it seems like the API can only access Text, Audio Files or Images. Is there any way for it to access PDF files (e.g. via Vertex AI or Google Drive uploads)? The current documentations are quite intransparent.

Upvotes: 5

Views: 6114

Answers (3)

Bhaskar
Bhaskar

Reputation: 11

Check if below code helps.. In my case, I tried extracting Vehicle Number and Date from a pdf file and create an excel output. I used Gemini. But logic should be same for other tools like ChatGPT etc too(But not sure!):
__________
__________
import google.generativeai as genai
from PyPDF2 import PdfReader
import json
import pandas as pd
import re

def extract_text_from_pdf(pdf_path):
    with open(pdf_path, 'rb') as pdf_file:
        pdf_reader = PdfReader(pdf_file)
        text = ""

        for page_num in range(len(pdf_reader.pages)):
            page = pdf_reader.pages[page_num] 
            text += page.extract_text()

        return text 
    
def create_excel_file(vehicle_number, date, filename):
    if vehicle_number and date:
        data = {'Vehicle Number': [vehicle_number], 'Date': [date]}
        df = pd.DataFrame(data)
        df.to_excel(filename, index=False)
    else:
        print("No data to write to Excel.")

def get_gemini_response(prompt):
  model = genai.GenerativeModel("gemini-1.5-pro-latest", generation_config={"response_mime_type": "application/json"})
  response = model.generate_content(prompt)
  try:
        response = model.generate_content(prompt)
        response_json = json.loads(response.text)
        if response_json:  # Check if response_json is not empty
            vehicle_number = response_json.get('vehicle_number', None)
            date = response_json.get('date', None)
            return vehicle_number, date
        else:
            print("Error: Empty response from Gemini")
            return None, None
  except Exception as e:
            print(f"Error fetching Gemini response: {e}")
  return None, None

if __name__ == "__main__":
  # Replace with your API key
  genai.configure(api_key="ACTUAL API KEY")

  pdf_path = 'D:\\XXX\\ExtractVHNumber.pdf'     #Provide path of the pdf file
  filename = 'D:\\XXX\\output.xlsx'     #Provide path of the excel file where output is require

  #text = "The Vehicle that went has a Vehicle Number: AS01A1234 and it went on Date: 01/01/2024" ...... Content of the test pdf file..

  text = extract_text_from_pdf(pdf_path)
  #query = "What is the capital of France?"
  query = "Extract the Vehicle Number and Date from the following text:\n" + text + "\nPlease provide the output in JSON format with keys 'vehicle_number' and 'date'."
   
  vehicle_number, date = get_gemini_response(query)

  create_excel_file(vehicle_number, date, filename)
  print("Excel Creation done")

Upvotes: 0

Tanaike
Tanaike

Reputation: 201378

As the current another approach, how about using the PDF data by converting the images? Gemini 1.5 API can be used for analyzing the images. The flow is as follows.

  1. Convert PDF data to images (PNG and Jpeg).
  2. Upload images to Gemini. Ref
  3. Generate content using the uploaded images. Ref

In my case, I use this approach for parsing various invoices. Ref I expect that PDF data will be able to be used in future updates.

Updated on August 14, 2024

In the current stage, the PDF data can be directly used with Gemini API. Ref

The PDF data can be used as both inlineData as base64 and file_data as uri of the uploaded data to Gemini.

Upvotes: 3

Linda Lawton - DaImTo
Linda Lawton - DaImTo

Reputation: 116948

the file api for gemini doesnt support pdf.

but what you can do is exactly what ai studio does and parse the pdf file as text and then just add it as part of your prompt

  1. read the file as text string
  2. tell me about this text [dump string]

Upvotes: 1

Related Questions