The following is a guide I have assembled detailing the creation and management of a RAID 5 array running on Linux. This is software RAID I’m talking about as opposed to hardware RAID which requires a separate hardware RAID controller.
Let us begin with some background information and requirements to make all this work. The information below works on CentOS 5.3 x86_64 with the 2.6.30 Linux Kernel.
For a software RAID 5 in Linux you need a minimum of 3 hard disks. All three disks should be of the same size, speed, and model. They don’t have to be but the most optimal situation would be if they were.
You also need md support compiled into your Kernel, and mdadm installed.
CREATING AN ARRAY
The very first step is to actually install the hard drives. It is good practice to install them on the same controller. I used SATA drives for this guide. After installing them I had 3 new devices: sdc, sdd, and sde.
Next, you need to create partitions on the disks using fdisk. Create a single primary partition on each drive with a partition type of fd (linux raid).
Each RAID partition created is seen by mdadm as a RAID device. We need to tell mdadm to create a new array using these devices. The following command will do so:
mdadm --create --verbose /dev/md2 --level=5 --auto=yes --raid-devices=3 /dev/sd[cde]1
After executing this command you will have a new RAID device, /dev/md2, and the array itself will be in the process of building.
You can watch the process with:
watch cat /proc/mdstat
Depending on the size of the disks you have installed this could take a very long time.
Once you are done creating the array completely, you should update the mdadm.conf file. This will help the Kernel at boot time to detect and initialize the array. You should also run this command when you make any changes to the array, i.e. adding disks.
mdadm --detail --scan > /etc/mdadm.conf
CREATING A FILE SYSTEM
After the creation of the array is complete it is not usable until you put a file system onto it. Deciding which file system is best is probably the most crucial step in this process. Here are some key factors to keep in mind:
When choosing a file system…
- How will I add disks?
- Can I resize my volume?
- Will expanding storage space require downtime?
The best solution in my opinion is the use of XFS as a file system. It allows growing of the file system without having to unmount the volume, so no downtime would be required for future improvements.
To create and manage an XFS file system you need to have several tools installed: xfsprogs, xfstools, and kmod-xfs. Make sure the xfs module is loaded, or just reboot, before creating the file system with the following command:
mkfs.xfs -f /dev/md2
That should complete relatively quickly, after which you can mount up /dev/md2 and make sure it is readable and writable. At this point we now have a working software RAID 5 array on Linux.
EXPANDING THE ARRAY
In time it will be necessary to expand the array and add more storage by adding another hard disk. There are 5 steps involved in this procedure:
- Insert the new disk
- Create the RAID partition on the disk
- Add the new disk as a spare in the existing array
- Grow (expand) the array onto the new disk
- Grow the raid volume’s file system to utilize the new capacity
The first two steps have already been covered. Once they are completed we must tell mdadm there is a new device we would like it to use as a spare in our existing array. Assuming our new disk is sdf and we created the partition sdf1 properly, the following command does so:
mdadm /dev/md2 --add /dev/sdf1
If you look at /proc/mdstat it will be clear that there is now a spare in the array. Now we simply tell mdadm that we have one more drive in the actual array. So if you went from 3 drives to 4 you would use the following command:
mdadm /dev/md2 --grow --raid-devices=4
This will take an extremely long time, longer than the initial array creation. You can, again, watch the status in /proc/mdstat.
Finally, we expand the file system running upon the array. If you chose XFS as recommended the command is very simple and doesn’t require any unmounting of the file system. Just execute:
In no time the file system will have expanded itself.
REPLACING A FAILED DISK
It is inevitable that you will one day need to replace one of the disks in your RAID array. Knowing what to do in that situation will save you from really screwing something up.
Replacing a disk is as simple as…
- Determine which device has failed
- Fail the device
- Remove the device from the array
- Remove the physical disk
- Replace the failed disk with a new disk
- Determine the new disk’s device name
- Partition the new disk
- Add the new raid device into the array
- Wait for the array to finish rebuilding
Looking at /proc/mdstat will tell you which disk has failed. There will be an “F” next to it. Let’s look at an example where sdg has failed and the RAID device on that disk is sdg1 and is a member of a RAID 5 array called md2.
First, mark the device as failed:
mdadm /dev/md2 --fail /dev/sdg1
Now remove it from the array:
mdadm /dev/md2 --remove /dev/sdg1
Next, physically remove the failed disk from the system and replace it with a good disk. Depending on how the drives were detected, drives may not be labeled (sda, sdb, etc) in an order you might expect to match up with the physical connections. The following command will match drive labels with serial numbers so you can be assured you remove the actual bad drive.
ls -l /dev/disk/by-id
After inserting the new disk, you will want to look at the output of dmesg to determine the device name Linux assigns this disk. Lets continue assuming the new disk was named sdg.
We now partition the new disk. This is even easier than before because we can use sfdisk to make it have the same partition table as other disks in the array. To make it look just like sdf you would use the following command:
sfdisk -d /dev/sdf | sfdisk /dev/sdg
Finally, we add the new raid device into the array:
mdadm /dev/md2 --add /dev/sdg1
Then, just keep an eye on /proc/mdstat until the RAID is completely rebuilt. It could take a while.
RECONSTRUCTING A BROKEN ARRAY
In time things may go horribly wrong. For example, losing more than one disk in an array is very possible. Though, it is likely that only one of the failed disks is actually bad, it is difficult to determine which one has indeed failed. In this scenario we can easily recover the array.
In the event of total hardware failure, it is possible to move all the RAID members for the array into another machine. In this case, /etc/mdadm.conf would not be populated with the UUID for the array to be rescued. Mdadm has a special assemble mode just for this case.
mdadm --assemble /dev/md2 --verbose /dev/sdc1 /dev/sdd1 /dev/sdf1 /dev/sdg1
This will attempt to reassemble an array using the RAID members sdc1, sdd1, sdf1, and sdg1. Notice that it is not necessary to specify the RAID level. It is, however, necessary to specify the RAID device the array should be assembled into. You much choose a device, /dev/md2 in this case, that is not in use.
Now, if things are worse off, the above command would have failed with a message saying something along the lines of “could not assemble array with only 2 out of 4 members”. This would be the message for a failed RAID 5 with 4 member disks. Healthy is 4/4, degraded is3/4, so having 2/4 is not possible. We can recover from this, however.
mdadm --assemble --force /dev/md2 --verbose /dev/sdc1 /dev/sdd1 /dev/sdf1 /dev/sdg1
In this command we simply force the assemble. What we will end up with is likely a degraded array. Mdadm will work backwards for us while assembling. So, it will recover the members of the array which failed in domino-effect fashion after the initial failed member. All that is left will be a single failed member. Just re-add the failed member as is explained in the previous section “Replacing a Failed Disk”. If it adds back in properly and the array resyncs without problems, then the disk is not bad afterall.
mdadm /dev/md2 --add /dev/sdg1
Finally, after we have rescued our array by using mdadm’s assemble mode, we need to regenerate the mdadm.conf file so the array can be redetected properly upon reboot.
mdadm --detail --scan > /etc/mdadm.conf