highpost
highpost

Reputation: 1323

CompletedProcess from subprocess.run() doesn't return a string

According to the Python 3.5 docs, subprocess.run() returns an a CompletedProcess object with a stdout member that contains "A bytes sequence, or a string if run() was called with universal_newlines=True." I'm only seeing a byte sequence and not a string, which I was assuming (hoping) would be equivalent to a text line. For example,

import pprint
import subprocess

my_data = ""
line_count = 0

proc = subprocess.run(
         args = [ 'cat', 'input.txt' ],
         universal_newlines = True,
         stdout = subprocess.PIPE)

for text_line in proc.stdout:
    my_data += text_line
    line_count += 1

word_file = open('output.txt', 'w')
pprint.pprint(my_data, word_file)
pprint.pprint(line_count, word_file)

Note: this uses a new feature in Python 3.5 that won't run in previous versions.

Do I need to create my own line buffering logic, or is there a way to get Python to do that for me?

Upvotes: 20

Views: 58370

Answers (3)

jfs
jfs

Reputation: 414235

proc.stdout is already a string in your case, run print(type(proc.stdout)), to make sure. It contains all subprocess' output -- subprocess.run() does not return until the child process is dead.

for text_line in proc.stdout: is incorrect: for char in text_string enumerates characters (Unicode codepoints) in Python, not lines. To get lines, call:

lines = result.stdout.splitlines()

The result may be different from .split('\n') if there are Unicode newlines in the string.

If you want to read the output line by line (to avoid running out of memory for long-running processes):

from subprocess import Popen, PIPE

with Popen(command, stdout=PIPE, universal_newlines=True) as process:
    for line in process.stdout:
        do_something_with(line)

Note: process.stdout is a file-like object in this case. Popen() does not wait for the process to finish -- Popen() returns immidiately as soon as the child process is started. process is a subprocess.Popen instance, not CompletedProcess here.

If all you need is to count the number of lines (terminated by b'\n') in the output, like wc -l:

from functools import partial

with Popen(command, stdout=PIPE) as process:
    read_chunk = partial(process.stdout.read, 1 << 13)
    line_count = sum(chunk.count(b'\n') for chunk in iter(read_chunk, b''))

See Why is reading lines from stdin much slower in C++ than Python?

Upvotes: 23

Rider
Rider

Reputation: 99

if you need to have STDOUT lines in an array to better manipulate them you simply miss to split output by the "Universal newline" separators

nmap_out = subprocess.run(args = ['nmap', '-T4', '-A', '192.168.1.128'],
                              universal_newlines = True,
                              stdout = subprocess.PIPE)

nmap_lines = nmap_out.stdout.splitlines()
print(nmap_lines)

output is:

['Starting Nmap 7.01 ( https://nmap.org ) at 2016-02-28 12:24 CET', 'Note: Host seems down. If it is really up, but blocking our ping probes, try -Pn', 'Nmap done: 1 IP address (0 hosts up) scanned in 2.37 seconds']

Upvotes: 7

fiacre
fiacre

Reputation: 1180

You are seeing a string, compare:

import subprocess
proc = subprocess.run(
    args = [ 'cat', 'input.txt' ],
    universal_newlines = False,
    stdout = subprocess.PIPE)

print (type(proc.stdout))

class 'bytes'

run calls popen.communicate

communicate() returns a tuple (stdout_data, stderr_data). The data will be bytes or, if universal_newlines was True, strings.

Have a look here for more explanation and other shell interactions.

Upvotes: 1

Related Questions