GSP
GSP

Reputation: 175

Single-node ceph cluster unresponsive from client

I have attempted to set up a small one-node ceph cluster for some proof-of-concept work with ceph fs. The cluster is running centos 7 OS with :

# ceph --version
ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)

The cluster appears healthy:

# ceph -s
  cluster:
    id:     fa18d061-b6fd-4092-bbe3-31f4f8493360
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum se-ceph1-dev
    mgr: se-ceph1-dev(active)
    mds: cephfs-1/1/1 up  {0=se-ceph1-dev=up:active}
    osd: 1 osds: 1 up, 1 in

  data:
    pools:   2 pools, 64 pgs
    objects: 22  objects, 2.2 KiB
    usage:   1.0 GiB used, 39 GiB / 40 GiB avail
    pgs:     64 active+clean

All ceph commands work perfectly on the OSD node (which is also the mon,mgr,mds). However any attempt to access the cluster as a client (default user admin) from another machine is completely ignored. For instance:

cephcli$ ceph status
2020-07-08 08:12:58.358 7fa4c568e700  0 monclient(hunting): authenticate timed out after 300
2020-07-08 08:17:58.360 7fa4c568e700  0 monclient(hunting): authenticate timed out after 300
2020-07-08 08:22:58.362 7fa4c568e700  0 monclient(hunting): authenticate timed out after 300
2020-07-08 08:27:58.364 7fa4c568e700  0 monclient(hunting): authenticate timed out after 300
2020-07-08 08:32:58.363 7fa4c568e700  0 monclient(hunting): authenticate timed out after 300

The client machine is running OS 18.04.1-Ubuntu and has the same release of ceph installed as the osd node:

cephcli$ ceph --version
ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)

I have verified that no clients are blacklisted:

# ceph osd blacklist ls
listed 0 entries

I have verified that the various ceph agents are listening on their respective ports on the OSD node:

# netstat -tnlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:6800            0.0.0.0:*               LISTEN      32591/ceph-osd
tcp        0      0 0.0.0.0:6801            0.0.0.0:*               LISTEN      32591/ceph-osd
tcp        0      0 0.0.0.0:6802            0.0.0.0:*               LISTEN      32591/ceph-osd
tcp        0      0 0.0.0.0:6803            0.0.0.0:*               LISTEN      32591/ceph-osd
tcp        0      0 0.0.0.0:6804            0.0.0.0:*               LISTEN      33279/ceph-mds
tcp        0      0 0.0.0.0:6805            0.0.0.0:*               LISTEN      32579/ceph-mgr
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      13881/sshd
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      14038/master
tcp        0      0 10.19.4.159:6789        0.0.0.0:*               LISTEN      32580/ceph-mon
tcp6       0      0 :::22                   :::*                    LISTEN      13881/sshd

I have verified that the client is indeed sending requests to the OSD node using tcpdump on port 6789:

# tcpdump -i ens192 port 6789 -x -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
08:42:05.183071 IP 10.19.4.84.37170 > 10.19.4.159.smc-https: Flags [S], seq 4146143942, win 64240, options [mss 1460,sackOK,TS val 1566694440 ecr 0,nop,wscale 7], length 0
        0x0000:  4500 003c c7d9 4000 4006 55ca 0a13 0454
        0x0010:  0a13 049f 9132 1a85 f721 22c6 0000 0000
        0x0020:  a002 faf0 30cd 0000 0204 05b4 0402 080a
        0x0030:  5d61 dc28 0000 0000 0103 0307
08:42:05.383784 IP 10.19.4.84.37172 > 10.19.4.159.smc

I have verified on the client that the /etc/ceph/ceph.client.admin.keyring file contains the same key as is on the OSD node.

I've checked the monitor log and see entries when I make requests on the OSD node:

2020-07-08 10:17:12.414 7f06268a3700  0 log_channel(audit) log [DBG] : from='client.? 10.19.4.159:0/3709075926' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch

However there is nothing reflecting the requests I'm making from the client node.

So requests are making it to the OSD node, but I'm not getting any response. Where have I gone wrong?

Upvotes: 1

Views: 3636

Answers (1)

GSP
GSP

Reputation: 175

In case anyone stumbles upon this, I found the answer! At least - the answer for my specific issue. My OSD host was set up in the default "defensive" mode with an iptables rule that rejected all incoming packets except for ssh. By deleting this rule, client requests immediately began working. To delete the rule (in my case):

sudo iptables -D INPUT  -j REJECT --reject-with icmp-host-prohibited

Once I did that, the client could immediately connect. The CEPH troubleshooting guide actually mentions this in the "clock-skew" section:

https://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-mon/#clock-skews

Upvotes: 2

Related Questions