Reputation: 41
As mentioned in camelot, we can extract table from particular region like:
tables = camelot.read_pdf('table_regions.pdf', table_regions=['170,370,560,270'])
But how can I find these regions for my pdf.
Upvotes: 4
Views: 6001
Reputation: 1
If you just want to detect the table region you are reading, try to do this using Jupyter Notebook:
.read_pdf
method: tables = camelot.read_pdf('table_regions.pdf', table_regions=['170,370,560,270'], flavor='lattice')
; pay attention on the flavor, because it defines whether the table have borderlines or not(it can be lattice for borders or stream for space).camelot.plot(tables[index], kind='contour')
(You may know about how many index your object have by simply executing the name of the object. e.g.: tables
runnign inside .ipynb cell)(contour is a visual debugging).tables[index].df
.Upvotes: 0
Reputation: 182
I know it's a late reply - but I just came across a possible solution.
If you're looking for a automated extraction method, you could use lattice
in a first step, retrieve the table boundaries with tables[0]._bbox
and use these numbers in a second call to camelot.read_pdf()
into the argument table_areas
.
Be aware that they are in a weirdly sorted format for a bbox.
Upvotes: 2
Reputation: 3536
You can detect this regions, by some visual debugging.
https://camelot-py.readthedocs.io/en/master/user/advanced.html#visual-debugging
Upvotes: 2