How to replace GPT-labeled drives in a software RAID

Brain Dump

How to replace GPT-labeled drives in a software RAID

For the sake of this post, lets assume that you have a RAID 1 device setup on Linux.

First identify whether your RAID is healthy. If it is, this is what it looks like:

[root@node ~]# cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sda2[0] sdb2[1] 3650885632 blocks super 1.2 [2/2] [UU] bitmap: 0/28 pages [0KB], 65536KB chunk


md127 : active raid1 sda1[0] sdb1[1]

      255868928 blocks super 1.2 [2/2] [UU]

      bitmap: 1/2 pages [4KB], 65536KB chunk

unused devices:

If the RAID device is NOT healthy, the output looks like this:

[root@node ~]# cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sda2[1] 3650885632 blocks super 1.2 [2/1] [_U] bitmap: 4/28 pages [16KB], 65536KB chunk


md127 : active raid1 sda1[1]

      255868928 blocks super 1.2 [2/1] [_U]

      bitmap: 2/2 pages [8KB], 65536KB chunk

unused devices:
We can see that we only see block device sda active on both RAID devices. We can see that sdb has disappeared. Lets check whether sdb is indeed dead.
[root@node ~]# smartctl -i /dev/sdb smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-229.el7.x86_64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION === Vendor: HP Product: SOMEPRODUCTID Revision: HPD5 Logical block provisioning type unreported, LBPME=-1, LBPRZ=0 Rotation Rate: 7200 rpm Form Factor: 3.5 inches Logical Unit id: 0x5000c500846bacf7 Serial number: SOMESERIAL Device type: disk Transport protocol: SAS Local Time is: Wed Feb 10 10:25:11 2016 CST device is NOT READY (e.g. spun down, busy) A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

Lets proceed to removing this device from the RAID, if it has already not removed automatically.

[root@node ~]# mdadm --manage /dev/md126 --fail /dev/sdb2 [root@node ~]# mdadm --manage /dev/md126 --remove /dev/sdb2 [root@node ~]# mdadm --manage /dev/md127 --fail /dev/sdb1 [root@node ~]# mdadm --manage /dev/md127 --remove /dev/sdb1

Now, go to the server and physically remove the old, dead drive and replace it with a healthy, new one. You can physically identify the dead drive (and its serial number) from the output from smartctl -i output.

Once the server is booted up, set up the new drive. To do so, we need a package called gdisk. Install gdisk on the server and clone the GPT partition table.

[root@c125 ~]# sgdisk -R /dev/sdb /dev/sda The operation has completed successfully. [root@c125 ~]# sgdisk -G /dev/sdb The operation has completed successfully.

The first command replicates the partition table from the good RAID drive into the newly added drive. The second command randomizes the GPT UIDs on the second drive, just so that the drive is not an exact clone.

Add the newly readied drive into the RAID.

[root@c125 ~]# mdadm --manage /dev/md126 --add /dev/sdb2 mdadm: added /dev/sdb2 [root@c125 ~]# mdadm --manage /dev/md127 --add /dev/sdb1 mdadm: added /dev/sdb1

Now wait until the drives are synced and release the server into the wild. Well, not literally.

This entry was posted on Wednesday, February 10th, 2016 at 12:10 pm and is filed under HowTos. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Comments are closed.