Tutankhamen
Tutankhamen

Reputation: 3562

Losing stdout data in python

I'm trying to make a python script which is going run a bash script on a remote machine via ssh and then parse its output. The bash script outputs lot of data (like 5 megabytes of text / 50k lines) in stdout and here is a problem - I'm getting all the data only in ~10% cases. In other 90% cases I'm getting about 97% of what i expect and it looks like it always trims at the end. This is how my script looks like:

import subprocess
import re
import sys
import paramiko

def run_ssh_command(ip, port, username, password, command):
    ssh = paramiko.SSHClient()    
    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())                                                   
    ssh.connect(ip, port, username, password)                                                                   
    stdin, stdout, stderr = ssh.exec_command(command)                                                           
    output = ''                                                                                                 
    while not stdout.channel.exit_status_ready():                                                               
        solo_line = ''                                                                                          
        # Print stdout data when available                                                                      
        if stdout.channel.recv_ready():                                                                         
            # Retrieve the first 1024 bytes                                                                     
            solo_line = stdout.channel.recv(2048).                                                              
            output += solo_line                                                                                 
    ssh.close()                                                                                                 
    return output                                                                                  

result = run_ssh_command(server_ip, server_port, login, password, 'cat /var/log/somefile')
print "result size: ", len(result)                                                                                    

I'm pretty sure that problem is in overflowing of some internal buffer, but which one and how to fix it?

Thank you very much for any tip!

Upvotes: 1

Views: 266

Answers (2)

Vasiliy Faronov
Vasiliy Faronov

Reputation: 12310

When stdout.channel.exit_status_ready() starts returning True, there might still be a lot of data on the remote side, waiting to be sent. But you only receive one more chunk of 2048 bytes and quit.

Instead of checking the exit status, you could keep calling recv(2048) until it returns an empty string, which means that no more data is coming:

output = ''
next_chunk = True
while next_chunk:
    next_chunk = stdout.channel.recv(2048)
    output += next_chunk

But really you probably just want:

output = stdout.read()

Upvotes: 1

sigman
sigman

Reputation: 1301

May I suggest a less crude way to execute command over ssh via Fabric library. It may look like this (omitting ssh authentication details):

from fabric import Connection

with Connection('user@localhost') as con:
    res = con.run('~/test.sh', hide=True)
    lines = res.stdout.split('\n')
    print('{} lines readen.'.format(len(lines)))

given the test script ~/test.sh

#!/bin/sh
for i in {1..1234}
do
  echo "Line $i"
done

all of the output is correctly consumed

Upvotes: 1

Related Questions