Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Anchor
node_replacement
node_replacement

Hardware problem requiring node replacement

In case one of the nodes suffers a difficult to repair situation (lost motherboard for example, or lost disk with no RAID), it might become necessary to replace the server with a blank one.

From the cluster point of view, we will need to remove the old node and add the new one, for both corosync/pacemaker and ceph.

...

  1. The ansible/playbooks/replace_machine_remove_machine_from_cluster.yaml playbook can remove a node in from the cluster. For this, the machine_to_remove should be set to the hostname to remove.
    The below command should be launch in the ansible project.

    Code Block
    languagebash
    cqfd run ansible-playbook -i /path/to/inventory.yaml -e machine_to_remove=HOSTNAME playbooks/replace_machine_remove_machine_from_cluster.yaml


  2. A new hosts should be install with the ISO installer and a new IP address. Make the the same hostname, ip address, etc... than the old node.
  3. Make the "cluster network" connections between hosts.
  4. Restart the cluster_setup_debian.yml playbook to configure the new host in the cluster (more details here).

draw.io DiagrambordertruediagramNamereplace-dead-nodesimpleViewerfalsewidthlinksautotbstyletoppageId65818546lboxtruediagramWidth2121Image Added