Go's SSH client and PTY on AIX

Question

I doubt that I'll get an answer here as AIX is very rare thing but I should try at least.

The background

We have the program. The program uses golang.org/x/crypto/ssh library to connect to the remote services and do some things. The program is part of the large service and widely tested by end-users. It works without issues (at least related to connection) not only with all Linux-based clients (include quite old things like Ubuntu 12.02) but also with the clients on FreeBSD, OpenBSD, NetBSD, MacOSX, Solaris SPARC, HP-UX and other *nixes. So looks like it wasn't tested only on the Samsung refrigerators. And yesterday I was sure that it will be able to connect to the refrigerator and do what is needed without any issues. But that was yesterday...

The problem

Today we decided to add AIX support to our program. And we partly failed.

The problem description is simple: after pty request program stops working. I mean I can do ssh.RequestPty it executes without any issues but when I'm trying to execute commands after the app just hangs. Without errors, without nothing. Just hangs.

When it works?

It works in PuTTY/KiTTY so I'm able to connect to the remote host.
If I remove requestPty - everything works. But we need pty for the sudo.
It works without issues if I request session.Shell even with pty requested. So if I write kind of interactive shell, it works perfectly.

What have I tried so far

I tried to debug so far as I could. The last command that executes is ch.sendMessage(msg) from ssh/channel.go. I mean it writes packet and that's all. No data returned from the remote host.

For the tests, I used 3 versions of AIX - 5.3, 6.1 and 7.1. No difference.

OpenSSH versions are different:

5.3 - OpenSSH_5.2p1, OpenSSL 0.9.8k 25 Mar 2009
6.1 & 7.1 - OpenSSH_6.0p1, OpenSSL 1.0.1e 11 Feb 2013

All machines are running in LPARs but I doubt this is related to the issue.

I have no idea what is wrong. And I even can't say if this is common AIX issue or only our test machine. Here is the sample program that should write IT WORKS if it works

package main

import (
    "golang.org/x/crypto/ssh"
)

func main() {
    server := "127.0.0.1:22"
    user := "root"
    p := "password"

    config := &ssh.ClientConfig{
        User: user,
        Auth: []ssh.AuthMethod{ssh.Password(p)},
    }
    conn, err := ssh.Dial("tcp", server, config)
    if err != nil {
        panic(err.Error())
    }
    defer conn.Close()
    session, err := conn.NewSession()
    if err != nil {
        panic(err.Error())
    }
    defer session.Close()

    // Comment below and everything works
    modes := ssh.TerminalModes{
        ssh.ECHO:          0,
        ssh.TTY_OP_ISPEED: 14400,
        ssh.TTY_OP_OSPEED: 14400,
    }

    if err := session.RequestPty("xterm", 80, 40, modes); err != nil {
        panic(err.Error())
    }
    // Comment above and everything works
    session.Run("echo 1")
    println("IT WORKS")
}

If you have AIX somewhere around and can run this code against it I'd appreciate your feedback.

If you have any ideas (even crazy) why it may fail and where else I can look, don't be shy.

Update (2017-03-02):

By suggestion from @LorinczyZsigmond I launched sshd in debug mode. Results are a bit strange.

Here is part of Debian 9.0 OpenSSH_6.0p1 Debian-4+deb7u3, OpenSSL 1.0.1t 3 May 2016 log after sample program execution:

debug1: session_input_channel_req: session 0 req pty-req
debug1: Allocating pty.
debug1: session_pty_req: session 0 alloc /dev/pts/1
debug1: SELinux support disabled
debug1: server_input_channel_req: channel 0 request exec reply 1
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req exec

debug2: fd 3 setting TCP_NODELAY

debug3: packet_set_tos: set IP_TOS 0x10

debug1: Setting controlling tty using TIOCSCTTY.

debug2: channel 0: rfd 10 isatty
debug2: fd 10 setting O_NONBLOCK

debug3: fd 8 is O_NONBLOCK

debug2: channel 0: rcvd eof
debug2: channel 0: output open -> drain

It works as expected.

Now the same block from AIX 7.1 OpenSSH_6.0p1, OpenSSL 1.0.1e 11 Feb 2013 log:

debug1: session_input_channel_req: session 0 req pty-req
debug1: Allocating pty.
debug1: session_pty_req: session 0 alloc /dev/pts/42
debug1: server_input_channel_req: channel 0 request exec reply 1
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req exec
debug1: Values: options.num_allow_users: 0
debug1: RLOGIN VALUE  :1
debug1: audit run command euid 0 user root command 'whoami'

setsid: Operation not permitted.

After setsid: Operation not permitted. it does nothing until I kill it with Ctrl+C. When I kill it it returns:

debug2: fd 4 setting TCP_NODELAY
debug3: packet_set_tos: set IP_TOS 0x10
debug2: channel 0: rfd 10 isatty
debug2: fd 10 setting O_NONBLOCK
debug3: fd 8 is O_NONBLOCK
debug2: notify_done: reading
Exiting on signal 2
debug1: do_cleanup
debug1: session_pty_cleanup: session 0 release /dev/pts/42
debug1: audit session close euid 0 user root tty name /dev/pts/42
debug1: audit event euid 0 user root event 12 (SSH_connabndn)
debug1: Return Val-1 for auditproc:0

And sends the result of whoami back to the client. This looks like a bug in SSH server, but is this possible for the 2 different versions?

Another interesting fact is when I run sshd with truss (kind of strace for AIX) the output looks like this:

debug1: session_input_channel_req: session 0 req pty-req
debug1: Allocating pty.
debug1: session_pty_req: session 0 alloc /dev/pts/42
debug1: server_input_channel_req: channel 0 request exec reply 1
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req exec
debug1: Values: options.num_allow_users: 0
debug1: RLOGIN VALUE  :1
debug1: audit run command euid 0 user root command 'whoami'

debug2: fd 4 setting TCP_NODELAY

debug3: packet_set_tos: set IP_TOS 0x10

debug2: channel 0: rfd 10 isatty
debug2: fd 10 setting O_NONBLOCK

debug3: fd 8 is O_NONBLOCK

setsid: Operation not permitted.

debug2: channel 0: rcvd eof
debug2: channel 0: output open -> drain
debug2: channel 0: obuf empty
debug2: channel 0: close_write
debug2: channel 0: output drain -> closed

But truss output is a bit more strange than strace one (at least for someone who don't use *nix trace tools on daily basis) so I don't understand what is going on in the logs. If there is someone more skilled with this stuff here is the part of the trace data http://pastebin.com/YdzQwbt2 from debug1: RLOGIN VALUE :1.

Also, in the logs, I found that ssh.Shell() works because it doesn't request pty. It starts an interactive session (or something like that). But in my case, the interactive session is not an option.

Oleg Korchagin · Accepted Answer

better late than never

IBM said it was a bug in openssh - race condition while PTY allocation https://www-01.ibm.com/support/docview.wss?uid=isg1IV82042

fixed in package openssh.base.server:7.5.102.1500

it strange that bug only occurs in aix, never in linux. nevertheless, problem is solved in my case

Go's SSH client and PTY on AIX

Answers (2)

Related Questions

Go&#39;s SSH client and PTY on AIX

Answers (2)

Related Questions

Go's SSH client and PTY on AIX