Shubham Mishra
Shubham Mishra

Reputation: 41

How to find table region for camelot

As mentioned in camelot, we can extract table from particular region like:

tables = camelot.read_pdf('table_regions.pdf', table_regions=['170,370,560,270'])

But how can I find these regions for my pdf.

Upvotes: 4

Views: 6001

Answers (3)

studTon
studTon

Reputation: 1

If you just want to detect the table region you are reading, try to do this using Jupyter Notebook:

  1. Define the table region inside .read_pdf method: tables = camelot.read_pdf('table_regions.pdf', table_regions=['170,370,560,270'], flavor='lattice'); pay attention on the flavor, because it defines whether the table have borderlines or not(it can be lattice for borders or stream for space).
  2. Use camelot-py with plot from matplotlib: camelot.plot(tables[index], kind='contour') (You may know about how many index your object have by simply executing the name of the object. e.g.: tables runnign inside .ipynb cell)(contour is a visual debugging).
  3. The plot will show an image of your table with a red rectangle contour. Just repeats step 2 until you achieve the table region you want to extract.
  4. To test if the data is correct just use tables[index].df.

Upvotes: 0

Benedict Witzenberger
Benedict Witzenberger

Reputation: 182

I know it's a late reply - but I just came across a possible solution.

If you're looking for a automated extraction method, you could use lattice in a first step, retrieve the table boundaries with tables[0]._bbox and use these numbers in a second call to camelot.read_pdf() into the argument table_areas.

Be aware that they are in a weirdly sorted format for a bbox.

Upvotes: 2

You can detect this regions, by some visual debugging.

https://camelot-py.readthedocs.io/en/master/user/advanced.html#visual-debugging

Upvotes: 2

Related Questions