Reputation: 457
I am trying to extract information from a pdf using the Gemini API (1.5-pro). Right now it seems like the API can only access Text, Audio Files or Images. Is there any way for it to access PDF files (e.g. via Vertex AI or Google Drive uploads)? The current documentations are quite intransparent.
Upvotes: 5
Views: 6114
Reputation: 11
Check if below code helps.. In my case, I tried extracting Vehicle Number and Date from a pdf file and create an excel output. I used Gemini. But logic should be same for other tools like ChatGPT etc too(But not sure!):
__________
__________
import google.generativeai as genai
from PyPDF2 import PdfReader
import json
import pandas as pd
import re
def extract_text_from_pdf(pdf_path):
with open(pdf_path, 'rb') as pdf_file:
pdf_reader = PdfReader(pdf_file)
text = ""
for page_num in range(len(pdf_reader.pages)):
page = pdf_reader.pages[page_num]
text += page.extract_text()
return text
def create_excel_file(vehicle_number, date, filename):
if vehicle_number and date:
data = {'Vehicle Number': [vehicle_number], 'Date': [date]}
df = pd.DataFrame(data)
df.to_excel(filename, index=False)
else:
print("No data to write to Excel.")
def get_gemini_response(prompt):
model = genai.GenerativeModel("gemini-1.5-pro-latest", generation_config={"response_mime_type": "application/json"})
response = model.generate_content(prompt)
try:
response = model.generate_content(prompt)
response_json = json.loads(response.text)
if response_json: # Check if response_json is not empty
vehicle_number = response_json.get('vehicle_number', None)
date = response_json.get('date', None)
return vehicle_number, date
else:
print("Error: Empty response from Gemini")
return None, None
except Exception as e:
print(f"Error fetching Gemini response: {e}")
return None, None
if __name__ == "__main__":
# Replace with your API key
genai.configure(api_key="ACTUAL API KEY")
pdf_path = 'D:\\XXX\\ExtractVHNumber.pdf' #Provide path of the pdf file
filename = 'D:\\XXX\\output.xlsx' #Provide path of the excel file where output is require
#text = "The Vehicle that went has a Vehicle Number: AS01A1234 and it went on Date: 01/01/2024" ...... Content of the test pdf file..
text = extract_text_from_pdf(pdf_path)
#query = "What is the capital of France?"
query = "Extract the Vehicle Number and Date from the following text:\n" + text + "\nPlease provide the output in JSON format with keys 'vehicle_number' and 'date'."
vehicle_number, date = get_gemini_response(query)
create_excel_file(vehicle_number, date, filename)
print("Excel Creation done")
Upvotes: 0
Reputation: 201378
As the current another approach, how about using the PDF data by converting the images? Gemini 1.5 API can be used for analyzing the images. The flow is as follows.
In my case, I use this approach for parsing various invoices. Ref I expect that PDF data will be able to be used in future updates.
In the current stage, the PDF data can be directly used with Gemini API. Ref
The PDF data can be used as both inlineData
as base64 and file_data
as uri of the uploaded data to Gemini.
Upvotes: 3
Reputation: 116948
the file api for gemini doesnt support pdf.
but what you can do is exactly what ai studio does and parse the pdf file as text and then just add it as part of your prompt
Upvotes: 1