Reputation: 3782
I am wondering what the most efficient methods for reading/writing image and PDF files as numpy arrays for processing.
So far I have seen scipy.ndimage.imread
and using PIL
and numpy, which yeild the following results:
import os
import glob
from scipy.ndimage import imread
from PIL import Image
import numpy as np
import timeit
iters = 2
def scipy_fun():
for x in glob.glob("*.jpg"):
px = imread(x)
def PIL_fun():
for x in glob.glob("*.jpg"):
with Image.open(x) as im:
px = np.array(im)
print(timeit.Timer(scipy_fun).timeit(number=iters))
print(timeit.Timer(PIL_fun).timeit(number=iters))
running the script shows similar results with marginally better from scipy:
2.8794324089019234
3.0174482765699095
Are there any faster ways to do this?
Upvotes: 3
Views: 1983
Reputation: 1184
First, do this
pip install pdf2image
Then,
import numpy as np
from pdf2image import convert_from_path as read
import PIL
import cv2
#pdf in the form of numpy array to play around with in OpenCV or PIL
img = np.asarray(read('path to the pdf file')[0])#first page of pdf
Upvotes: 1