Reputation: 670
I installed a rook.io ceph storage cluster. Before installation, I cleaned up the previous installation like described here: https://rook.io/docs/rook/v1.7/ceph-teardown.html
The new cluster was provisioned correctly, however ceph is not healthy immediately after provisioning, and stuck.
data:
pools: 1 pools, 128 pgs
objects: 0 objects, 0 B
usage: 20 MiB used, 15 TiB / 15 TiB avail
pgs: 100.000% pgs not active
128 undersized+peered
[root@rook-ceph-tools-74df559676-scmzg /]# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 hdd 3.63869 1.00000 3.6 TiB 5.0 MiB 144 KiB 0 B 4.8 MiB 3.6 TiB 0 0.98 0 up
1 hdd 3.63869 1.00000 3.6 TiB 5.4 MiB 144 KiB 0 B 5.2 MiB 3.6 TiB 0 1.07 128 up
2 hdd 3.63869 1.00000 3.6 TiB 5.0 MiB 144 KiB 0 B 4.8 MiB 3.6 TiB 0 0.98 0 up
3 hdd 3.63869 1.00000 3.6 TiB 4.9 MiB 144 KiB 0 B 4.8 MiB 3.6 TiB 0 0.97 0 up
TOTAL 15 TiB 20 MiB 576 KiB 0 B 20 MiB 15 TiB 0
MIN/MAX VAR: 0.97/1.07 STDDEV: 0
[root@rook-ceph-tools-74df559676-scmzg /]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 14.55475 root default
-3 14.55475 host storage1-kube-domain-tld
0 hdd 3.63869 osd.0 up 1.00000 1.00000
1 hdd 3.63869 osd.1 up 1.00000 1.00000
2 hdd 3.63869 osd.2 up 1.00000 1.00000
3 hdd 3.63869 osd.3 up 1.00000 1.00000
Is there anyone who can explain what went wrong and how to fix the issue?
Upvotes: 1
Views: 5570
Reputation: 670
The problem is that osds are running on the same host and failure domain is set to host. Switching failure domain to osd fixes the issue. The default failure domain can be changed as per https://stackoverflow.com/a/63472905/3146709
Upvotes: 4