m09
m09

Reputation: 7503

Broken pipe during a subprocess stdin.write

I interact with a server that I use to tag sentences. This server is launched locally on port 2020.

For example, if I send Je mange des pâtes . on port 2020 through the client used below, the server answers Je_CL mange_V des_P pâtes_N ._., the result is always one line only, and always one line if my input is not empty.

I currently have to tag 9 568 files through this server. The first 9 483 files are tagged as expected. After that, the input stream seems closed / full / something else because I get an IOError, specifically a Broken Pipe error when I try to write on stdin.

When I skip the first 9 483 first files, the last ones are tagged without any issue, including the one causing the first error.

My server doesn't produce any error log indicating something fishy happened... Do I handle something incorrectly? Is it normal that the pipe fails after some time?

log = codecs.open('stanford-tagger.log', 'w', 'utf-8')
p1 = Popen(["java",
            "-cp", JAR,
            "edu.stanford.nlp.tagger.maxent.MaxentTaggerServer",
            "-client",
            "-port", "2020"],
           stdin=PIPE,
           stdout=PIPE,
           stderr=log)

fhi = codecs.open(SUMMARY, 'r', 'utf-8') # a descriptor of the files to tag

for i, line in enumerate(fhi, 1):
    if i % 500:
        print "Tagged " + str(i) + " documents..."
    tokens = ... # a list of words, can be quite long
    try:
        p1.stdin.write(' '.join(tokens).encode('utf-8') + '\n')
    except IOError:
        print 'bouh, I failed ;(('
    result = p1.stdout.readline()
    # Here I do something with result...
fhi.close()

Upvotes: 2

Views: 3891

Answers (1)

Aya
Aya

Reputation: 42100

In addition to my comments, I might suggest a few other changes...

for i, line in enumerate(fhi, 1):
    if i % 500:
        print "Tagged " + str(i) + " documents..."
    tokens = ... # a list of words, can be quite long
    try:
        s = ' '.join(tokens).encode('utf-8') + '\n'
        assert s.find('\n') == len(s) - 1       # Make sure there's only one CR in s
        p1.stdin.write(s)
        p1.stdin.flush()                        # Block until we're sure it's been sent
    except IOError:
        print 'bouh, I failed ;(('
    result = p1.stdout.readline()
    assert result                               # Make sure we got something back
    assert result.find('\n') == len(result) - 1 # Make sure there's only one CR in result
    # Here I do something with result...
fhi.close()

...but given there's also a client/server of which we know nothing about, there's a lot of places it could be going wrong.

Does it work if you dump all the queries into a single file, and then run it from the commandline with something like...

java .... < input > output

Upvotes: 1

Related Questions