Reputation: 175

Keeping Python Variables between Script Calls

I have a python script, that needs to load a large file from disk to a variable. This takes a while. The script will be called many times from another application (still unknown), with different options and the stdout will be used. Is there any possibility to avoid reading the large file for each single call of the script?

I guess i could have one large script running in the background that holds the variable. But then, how can I call the script with different options and read the stdout from another application?

Upvotes: 1

Answers (4)

cdarke

Reputation: 44444

Since you already can read the data into a variable, then you might consider memory mapping the file using mmap. This is safe if multiple processes are only reading it - to support a writer would require a locking protocol.

Assuming you are not familiar with memory mapped objects, I'll wager you use them every day - this is how the operating system loads and maintains executable files. Essentially your file becomes part of the paging system - although it does not have to be in any special format.

When you read a file into memory it is unlikely it is all loaded into RAM, it will be paged out when "real" RAM becomes over-subscribed. Often this paging is a considerable overhead. A memory mapped file is just your data "ready paged". There is no overhead in reading into memory (virtual memory, that is), it is there as soon as you map it .

When you try to access the data a page fault occurs and a subset (page) is loaded into RAM - all done by the operating system, the programmer is unaware of this.

While a file remains mapped it is connected to the paging system. Another process mapping the same file will access the same object, provided changes have not been made (See MAP_SHARED).

It needs a daemon to keep the memory mapped object current in kernel, but other than creating the object linked to the physical file, it does not need to do anything else - it can sleep or wait on a shutdown signal.

Other processes open the file (use os.open()) and map the object.

See the examples in the documentation, here and also Giving access to shared memory after child processes have already started

Upvotes: 1

Ajeet Ganga

Reputation: 8653

(I misunderstood the original question, but the first answer I wrote has a different solution, which might be useful to someone fitting that scenario, so I am keeping that one as is and proposing second solution. )

For a single machine, OS provided pipes are the best solution for what you are looking.

Essentially you will create a forever running process in python which reads from pipe, and process the commands entering the pipe, and then prints to sysout.

Reference: http://kblin.blogspot.com/2012/05/playing-with-posix-pipes-in-python.html

From above mentioned source

Workload In order to simulate my workload, I came up with the following simple script called pipetest.py that takes an output file name and then writes some text into that file.

#!/usr/bin/env python

import sys

def main():
    pipename = sys.argv[1]
    with open(pipename, 'w') as p:
        p.write("Ceci n'est pas une pipe!\n")

if __name__ == "__main__":
    main()

The Code In my test, this "file" will be a FIFO created by my wrapper code. The implementation of the wrapper code is as follows, I will go over the code in detail further down this post:

#!/usr/bin/env python

import tempfile
import os
from os import path
import shutil
import subprocess

class TemporaryPipe(object):
    def __init__(self, pipename="pipe"):
        self.pipename = pipename
        self.tempdir = None

    def __enter__(self):
        self.tempdir = tempfile.mkdtemp()
        pipe_path = path.join(self.tempdir, self.pipename)
        os.mkfifo(pipe_path)
        return pipe_path

    def __exit__(self, type, value, traceback):
        if self.tempdir is not None:
            shutil.rmtree(self.tempdir)

def call_helper():
    with TemporaryPipe() as p:
        script = "./pipetest.py"
        subprocess.Popen(script + " " + p, shell=True)
        with open(p, 'r') as r:
            text = r.read()
        return text.strip()

def main():
        call_helper()

if __name__ == "__main__":
    main()

Upvotes: 1

Ajeet Ganga

Reputation: 8653

You can store the processed values in a file, and then read the values from that file in another script.

>>> import pickle as p
>>> mystr="foobar"
>>> p.dump(mystr,open('/tmp/t.txt','wb'))
>>> mystr2=p.load(open('/tmp/t.txt','rb'))
>>> mystr2
'foobar'

Upvotes: 0

Maxim Kulkin

Reputation: 2788

Make it a (web) microservice: formalize all different CLI arguments as HTTP endpoints and send requests to it from main application.

Upvotes: 4

Keeping Python Variables between Script Calls

Answers (4)

Related Questions