Reputation: 29096
This question has been asked many times on SO (for instance here), but there is no real answer yet.
I am writing a short command line tool that renders templates. It is frigged using a Makefile:
i = $(wildcard *.in)
o = $(patsubst %.in, %.out, $(t))
all: $(o)
%.out: %.in
./script.py -o $@ $<
In this dummy example, the Makefile parses every .in
file to generate an .out
file. It is very convenient for me to use make
because I have a lot of other actions to trig before and after this script. Moreover I would like to remain as KISS as possible.
Thus, I want to keep my tool simple, stupid and process each file separately using the syntax
script -o out in
My script uses the following:
#!/usr/bin/env python
from jinja2 import Template, nodes
from jinja2.ext import Extension
import hiyapyco
import argparse
import re
...
The problem is that each execution costs me about 1.2s ( ~60ms for the processing and ~1140ms for the import directives):
$ time ./script.py -o foo.out foo.in
real 0m1.625s
user 0m0.452s
sys 0m1.185s
The overall execution of my Makefile for 100 files is ridiculous: ~100 files x 1.2s = 120s.
This is not a solution, but this should be the solution.
What alternative can I use?
EDIT
I love Python because its syntax is readable and size of its community. In this particular case (command line tools), I have to admit Perl is still a decent alternative. The same script written in Perl (which is also an interpreted language) is about 12 times faster (using Text::Xslate
).
I don't want to promote Perl in anyway I am just trying to solve my biggest issue with Python: it is not yet a suitable language for simple command line tools because of the poor import time.
Upvotes: 1
Views: 2280
Reputation: 9863
It seems quite clear where the problem is, right now you got:
cost(file) = 1.2s = 60ms + 1040ms
, which means:
cost(N*files) = N*1.2s
now, why don't you change it to become:
cost1(files) = 1040ms + N*60ms
that way, theorically processing 100 files would be 7,04s instead 120s
EDIT:
Because I'm receiving downvotes to this question, I'll post a little example, let's assume you got this python file:
# foo.py
import numpy
import cv2
print sys.argv[0]
The execution time is 1.3s on my box, now, if i do:
for /l %x in (1, 1, 100) do python foo.py
I'll get 100*1.3s execution time, my proposal was turn foo.py into this:
import numpy
import cv2
def whatever_rendering_you_want_to_do(file):
pass
for file in sys.argv:
whatever_rendering_you_want_to_do(file)
That way you're importing only once instead of 100 times
Upvotes: 0
Reputation: 34357
Write the template part as a separate process. The first time "script.py" is run it would launch this separate process. Once the process exists it can be passed the input/output filenames via a named pipe. If the process gets no input for x seconds, it automatically exits. How big x is depends on what your needs are
So the parameters are passed to the long running process via the script.py writing to a named pipe. The imports only occur once (provided the inputs are fairly often) and as BPL points out this would make everything run faster
Upvotes: 1
Reputation: 91049
It is not quite easy, but you could turn your program into one that sits in the background and processes commands to process a file.
Another program could feed the processing commands to it and thus make the real start quite easy.
Upvotes: 3
Reputation: 7211
You could use glob
to perform that actions with the files you need.
import glob
in_files=glob.glob('*.in')
out_files=glob.glob('*.out')
Thus, you process all the files in the same script, instead of calling the script every time with every pair of files. At least that way you don't have to start python every time.
Upvotes: 0