Reputation: 51
I want to extract all tables from pdf using camelot in python 3.
import camelot
# PDF file to extract tables from
file = "./pdf_file/ooo.pdf"
tables = camelot.read_pdf(file)
# number of tables extracted
print("Total tables extracted:", tables.n)
# print the first table as Pandas DataFrame
print(tables[0].df)
# export individually
tables[0].to_csv("./pdf_file/ooo.csv")
and then I get only 1 table from the 1st page of the pdf. how extract the whole tables from the pdf file??
Upvotes: 3
Views: 9276
Reputation: 71
In order to extract pdf tables with camelot you have to use the following code. You have to use stream parameter because it is very powerful in order to detect almost all the pdf tables. Also if you have problem with the extraction you have to add as a parameter the row_tol and edge_tol parameters.For example row_tol = 0 and edge_tol=500.
pdf_archive = camelot.read_pdf(file_path, pages="all", flavor="stream")
for page, pdf_table in enumerate(pdf_archive):
print(pdf_archive[page].df)
Upvotes: 3
Reputation: 3536
tables = camelot.read_pdf(file, pages='1-end')
If pages parameter is not specified, Camelot analyzes only the first page. For better explanation, see official documentation.
Upvotes: 3