Reputation: 133
I was using a pdf miner to extract data to a xml file, when I noticed that the objects' coordinate system was represented in the following way:
<textline bbox="187.098,693.242,288.642,709.202">
How can I convert this coordinates to a pixel system (x,y) using Python and having already parsed the bbox data to a variable in Python?
Upvotes: 2
Views: 6108
Reputation: 397
Based on the last answer I made my own function because it didn't work. the last answer is correct but fails to correct the y coordinates. therefore you have to get the size of the PDF and substract the pdf coordinates AND THEN Transform it. my function takes three arguments where src is a tuple of 4 float numbers representing (x0, y0, x1, y1). the bounding box co-ordinates, pdf is the object read by the library pdfrw:
from pdfrw import PdfReader
pdf = PdfReader(<path>)
and im is the PIL object obtained from the image:
from PIL import Image
im = Image.open(<path>)
function
def TranslatePoints(src, pdf, im):
sx0, sy0, sx1, sy1 = src
ssx, ssy = (int(pdf.pages[1].MediaBox[2]),int(pdf.pages[1].MediaBox[3]))
dsx, dsy = im.size
sy01 = ssy-sy1
sy11 = ssy-sy0
x0 = sx0/int(pdf.pages[1].MediaBox[2])*im.size[0]
x1 = sx1/int(pdf.pages[1].MediaBox[2])*im.size[0]
y0 = sy01/int(pdf.pages[1].MediaBox[3])*im.size[1]
y1 = sy11/int(pdf.pages[1].MediaBox[3])*im.size[1]
return (x0, y0, x1, y1)
to check I cropped the image, it works just fine
im.crop(TranslatePoints(src, pdf, im))
Upvotes: 2
Reputation: 365
first let's understand what the numbers generated by pdfminer for bounding box means. bounding box is a list of 4 numbers organized as the following (x0, x1, y0, y1) where x0, y0 are the co-ordinates of the top left point and y0, y1 are the co-ordinates of the bottom right point. basically, we can use those two points to draw a rectangle around the LT object. also a very important info here is that the origin of the pdf page lies in the bottom left not the top left as images (see this: https://github.com/euske/pdfminer/issues/19). the standard dpi of pdf page is 72 (see this: https://github.com/euske/pdfminer/issues/74) so basically it is in pixel system (this is because the default for png images also is 72 dpi). knowing this information you can implement a function that translates a point or a rectangle points to a new co-ordinate system (produced by supplying a new dpi number) as the following
def TranslatePoints(src, srcSize, dstSize):
sx0, sy0, sx1, sy1 = src
ssx, ssy = srcSize
dsx, dsy = dstSize
dx0 = sx0 / ssx * dsx
dx1 = sx1 / ssx * dsx
dy0 = sy0 / ssy * dsy
dy1 = sy1 / ssy * dsy
return dx0, dy0, dx1, dy1
src: the src rectangle. a tuple of 4 float numbers representing (x0, y0, x1, y1). the bounding box co-ordinates. srcSize: the size of the pdf page as tuple of two numbers (width, height) which you can get through my answer https://stackoverflow.com/a/48886525/3022413 (nvm the downvotes it is working and i tested it) dstSize: the size of the target co-ordinate system. tuple of x and y (max limit of the new co-ordinate system, maybe the size of an image you are trying to plot the rectangles on?)
Upvotes: 1