roman
roman

Reputation: 670

ceph pgs marked as inactive and undersized+peered

I installed a rook.io ceph storage cluster. Before installation, I cleaned up the previous installation like described here: https://rook.io/docs/rook/v1.7/ceph-teardown.html

The new cluster was provisioned correctly, however ceph is not healthy immediately after provisioning, and stuck.

  data:
pools:   1 pools, 128 pgs
objects: 0 objects, 0 B
usage:   20 MiB used, 15 TiB / 15 TiB avail
pgs:     100.000% pgs not active
         128 undersized+peered
[root@rook-ceph-tools-74df559676-scmzg /]# ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP  META     AVAIL    %USE  VAR   PGS  STATUS
 0    hdd  3.63869   1.00000  3.6 TiB  5.0 MiB  144 KiB   0 B  4.8 MiB  3.6 TiB     0  0.98    0      up
 1    hdd  3.63869   1.00000  3.6 TiB  5.4 MiB  144 KiB   0 B  5.2 MiB  3.6 TiB     0  1.07  128      up
 2    hdd  3.63869   1.00000  3.6 TiB  5.0 MiB  144 KiB   0 B  4.8 MiB  3.6 TiB     0  0.98    0      up
 3    hdd  3.63869   1.00000  3.6 TiB  4.9 MiB  144 KiB   0 B  4.8 MiB  3.6 TiB     0  0.97    0      up
                       TOTAL   15 TiB   20 MiB  576 KiB   0 B   20 MiB   15 TiB     0                   
MIN/MAX VAR: 0.97/1.07  STDDEV: 0
[root@rook-ceph-tools-74df559676-scmzg /]# ceph osd tree
ID  CLASS  WEIGHT    TYPE NAME                               STATUS  REWEIGHT  PRI-AFF
-1         14.55475  root default                                                     
-3         14.55475      host storage1-kube-domain-tld                           
 0    hdd   3.63869          osd.0                               up   1.00000  1.00000
 1    hdd   3.63869          osd.1                               up   1.00000  1.00000
 2    hdd   3.63869          osd.2                               up   1.00000  1.00000
 3    hdd   3.63869          osd.3                               up   1.00000  1.00000

Is there anyone who can explain what went wrong and how to fix the issue?

Upvotes: 1

Views: 5570

Answers (1)

roman
roman

Reputation: 670

The problem is that osds are running on the same host and failure domain is set to host. Switching failure domain to osd fixes the issue. The default failure domain can be changed as per https://stackoverflow.com/a/63472905/3146709

Upvotes: 4

Related Questions