Kent Munthe Caspersen
Kent Munthe Caspersen

Reputation: 6938

Calling multiple linux processes in Python and collecting output

From a Python script, I need to call a PL->EN translation service. The translation requires 3 steps: tokenization, translation, detoknization

From Linux, I can achieve this using 3 processes by the following commands executed in mentioned order:

/home/nlp/opt/moses/scripts/tokenizer/tokenizer.perl -l pl < path_to_input.txt > path_to_output.tok.txt

/home/nlp/opt/moses/bin/moses -f /home/nlp/Downloads/TED/tuning/moses.tuned.ini.1 -drop-unknown -input-file path_to_output.tok.txt -th 8 > path_to_output.trans.txt

/home/nlp/opt/moses/scripts/tokenizer/detokenizer.perl -l en < path_to_output.trans.txt > path_to_output.final.txt

which translates the file path_to_input.txt and outputs to path_to_output.final.txt

I have made the following script for combining the 3 processes:

import shlex
import subprocess
from subprocess import STDOUT,PIPE
import os
import socket

class Translator:
    @staticmethod
    def pl_to_en(input_file, output_file):
        # Tokenize
        print("Tokenization started")
        with open("tokenized.txt", "w+") as tokenizer_output:
            with open(input_file) as tokenizer_input:
                cmd = "/home/nlp/opt/moses/scripts/tokenizer/tokenizer.perl -    l pl"
                args = shlex.split(cmd)
                p = subprocess.Popen(args, stdin=tokenizer_input, stdout=tokenizer_output)
                p.wait()
                print("Tokenization finished")

        #Translate
        print("Translation started")
        with open("translated.txt", "w+") as translator_output:
            cmd = "/home/nlp/opt/moses/bin/moses -f /home/nlp/Downloads/TED/tuning/moses.tuned.ini.1 -drop-unknown -input-file tokenized.txt -th 8"
            args = shlex.split(cmd)
            p = subprocess.Popen(args, stdout=translator_output)
            p.wait()
            print("Translation finished")

        # Detokenize
        print("Detokenization started")
        with open("translated.txt") as detokenizer_input:
            with open("detokenized.txt", "w+") as detokenizer_output:
                cmd = "/home/nlp/opt/moses/scripts/tokenizer/detokenizer.perl -l en"
                args = shlex.split(cmd)
                p = subprocess.Popen(args, stdin=detokenizer_input, stdout=detokenizer_output)
                p.wait()
                print("Detokenization finished")

translator = Translator()
translator.pl_to_en("some_input_file.txt", "some_output_file.txt")

But only the tokenization part works. The translator just outputs an empty file translated.txt. When looking at the output in the terminal, it looks like the translator loads the file tokenized.txt correctly, and does a translation. The problem is just how I collect the output from that process.

Upvotes: 0

Views: 117

Answers (1)

Joe Young
Joe Young

Reputation: 5875

I would try something like the following - sending the output of the translator process to the pipe, and making the input of the detokenizer the pipe instead of using the files.

import shlex
import subprocess
from subprocess import STDOUT,PIPE
import os
import socket

class Translator:
    @staticmethod
    def pl_to_en(input_file, output_file):
        # Tokenize
        print("Tokenization started")
        with open("tokenized.txt", "w+") as tokenizer_output:
            with open(input_file) as tokenizer_input:
                cmd = "/home/nlp/opt/moses/scripts/tokenizer/tokenizer.perl -    l pl"
                args = shlex.split(cmd)
                p = subprocess.Popen(args, stdin=tokenizer_input, stdout=tokenizer_output)
                p.wait()
                print("Tokenization finished")

        #Translate
        print("Translation started")
        cmd = "/home/nlp/opt/moses/bin/moses -f /home/nlp/Downloads/TED/tuning/moses.tuned.ini.1 -drop-unknown -input-file tokenized.txt -th 8"
        args = shlex.split(cmd)
        translate_p = subprocess.Popen(args, stdout=subprocess.PIPE)
        translate_p.wait()
        print("Translation finished")
    # Detokenize
        print("Detokenization started")
        with open("detokenized.txt", "w+") as detokenizer_output:
            cmd = "/home/nlp/opt/moses/scripts/tokenizer/detokenizer.perl -l en"
            args = shlex.split(cmd)
            detokenizer_p = subprocess.Popen(args, stdin=translate_p.stdout, stdout=detokenizer_output)
            detokenizer_p.wait()
            print("Detokenization finished")

translator = Translator()
translator.pl_to_en("some_input_file.txt", "some_output_file.txt")

Upvotes: 1

Related Questions