home
linux / bsd
about
cheat sheet
links
Location: linux / howtos / Debian_Linux_software_RAID_with_MDADM

Distroname and release: Debian Squeeze

Debian Linux software RAID with MDADM

It is possible with help of mdadm to setup a software RAID.
This have some clear benefits. You do not dependent on a specific controller. If it fails, you will have to have the exact same model to be able to reconstruct your RAID. Since software RAID is not dependent on any hardware, we are not as vulnerable for hardware failues.
This of course comes with a downside as well, since the performance is not as good, as a real RAID hardware controller card.

When this said, most cheap RAID controllers, is just a emulated hardware RAID (meaning, software RAID).

Setup RAID under installation, Debian Squeeze

1) Boot the installer

2) Go through the required steps, until you are at the point called [!!] Partition disks.... Select Manual.

3) (Step might not be required).. Select the Disk 1, select it and press enter. If it asks you if you want to create an empty partition table, select yes. (WARNING! Will erase ALL data)!

4) Go down to Disk 1, where it says free space, and press enter, to create an partition, give it a size, and go to "Use as:" and select physical volume for RAID. Continue this step until all of your wanted partitions are created, make make sure they all are marked as RAID, and set the first one bootable.
Now do the exact same for disk 2.

Wether swap should be on RAID is debateable, look here for some detailed info.
http://tldp.org/HOWTO/Software-RAID-HOWTO-2.html

I want my partition layout to look like this.
Disk 1:
/ 15 gb (bootable)
swap 4 gb
/var 171.1 gb
/root 10 gb

Disk 2:
/ 15 gb (bootable)
swap 4 gb
/var 171.1 gb
/root 10 gb

No filesystem types or mountpoints are created at this time, so to do this, go to "Configure Software RAID", say Yes to write the changes to disk.
"Create MD Device"
"Select your RAID type (1 In my case)"
"Select number of active devices for the RAID1 array". (This is the number of disks available. 2 in my case)
Number of spare devices for the RAID1 array. (0, I do not have any spare disks).
Now select two devices. (Required to match. (In example /dev/sda1 and /dev/sdb1, PLEASE MAKE SURE THAT THE DISKS ARE THE CORRECT SIZE, SO YOU ARE SELECTING THE CORRECT ONE.
If the partition layout was exactly the same for Disk1 and Disk2, the devices /dev/sda1 and /dev/sdb1 should match, and /dev/sda2 /dev/sdb2 should match, and so on.
I have though expirenced that if you have made a mistake during the partition layout (the order of the partitions), and you delete them, and readd them, that /dev/sda1 /dev/sdb1 in example does not match!
In the above situation I solved it by starting from scratch in the partition layout, so the "numbers to match across partitions.

Now continue with this step until you are done selecting and matching active partitions.

When you are done, select finish.

You will be brought back to the first partition screen, and now you will have the options, to select filesystems and mountpoints for the newly configured RAID volumes.

RAID1 device #0
/ ext3, 15 gb
RAID1 device #1
swap 4 gb
RAID1 device #2
/var 171.1 gb
RAID1 device #3
/root 10 gb

And after this you will have to select "Finish partitioning and write changes to disk".

Continue and complete the installation.

Note:
You will now most likely have quite a lot disk activity, since the RAID is currently out of sync, and will need need to sync. This will have some performance degrade while doing so. It will take quite some time, depending on the size of the disks, and the disk utilization.

Setup RAID on a running system

installation:

aptitude install mdadm

Select weater mdadm devices are holding the root file system. In my case, it is on a USB stick, so I write "none" in this field. Start md arrays automatically yes configure the mdadm arrays: In this example, I am setting up an RAID5 array. My devices are as follows. /dev/sda /dev/sdb /dev/sdc

mdadm -C /dev/md0 -l 5 -n 3 /dev/sda /dev/sda /dev/sdc -v

Alternative setup the sdc disk as SPARE. This have the advantage that it will not use and tear the disk until another disk dies, but it will reduce the RAID capacity. In example, 3 drives of 500GB will give 500GB in total, whereas with the above solution it will us 1TB instead.

mdadm -C /dev/md0 -l 5 -n 2 /dev/sda /dev/sda -x 1 /dev/sdc -v

This, should show up some information about the chunk size (default 512), and also if you have created partitions on some of these devices, it will warn you that these will be lost! answer "yes" or just "y" to continue. It should show this output mdadm: array /dev/md0 started. Now the raid build is in progress. This could take a LONG time, depending on the size of the disks, speed of hardware and so on. To see a status check /proc/mdstat

cat /proc/mdstat

Or if you want frequent updates you can do it with the watch command, which will show an full screen output. This example will update every 5 seconds.

watch -n 5 cat /proc/mdstat

As you can see, this will take a little more than 2 hours. Have no worries, in the meantime, it is possible to setup the array, create partitions, filesystems, mount and use them. Please notice that if you continue to work with the new array, in example creating the filesystem, as we will do later, the buildtime will increase.

First, I will create an partition on the mdadm array. (Yes I am preferring, cfdisk over fdisk). It might warn you about "Unknown partition table type". Answer yes, to Start with a zero table. Create an partition, with a Linux type, write the partition table, and quit.

cfdisk /dev/md0

Now you should have an partition on /dev/md0p1. Note: It is actually not needed to create a partition, if you wish to use the entire array for one partition. I prefer to have partitions, I personally think it gives me a better overview. Just do the below if you wish to do so, or skip this step.

mkfs.ext4 /dev/md0

Create the filesystem if you not have done the above!, this takes some time. Notice: Doing so, will increase the buildtime of the array, if this is not complete.

mkfs.ext4 /dev/md0p1

Create the array in the config file. If this is not done, it is possible that the array is something recoignized as /dev/md127 instead of /dev/md0. First get the UUID of the array.

mdadm --detail --scan

Output should be something below.

ARRAY /dev/md/0 metadata=1.2 name=thor:0 UUID=1878c959:78d3c43d:82f3ec63:3a410a8e

Use the UUID in the mdadm.conf file

/etc/mdadm/mdadm.conf
# definitions of existing MD arrays
ARRAY /dev/md/0 metadata=1.2 name=thor:0 UUID=1878c959:78d3c43d:82f3ec63:3a410a8e

create mountpoint, and mount the volume.

mkdir /mnt/RAID5array
mount /dev/md0p1 /mnt/RAID5array

Configure fstab to automount this volume on boot. Insert the following line.

/etc/fstab
/dev/md0p1	/mnt/RAID5array	ext4 rw,user,auto	0	0

Monitoring

It is crucial that we discover if something goes wrong with our software raid. We can do this with e-mail alerts.
You must be able to send mails from this host, before this is working.

Now we want to be sure that this is triggered when an event occours.
"Reconfigure" mdadm, and select Yes to start the MD monitoring daemon, and i the next step put your e-mail address.

dpkg-reconfigure madm

Next we can test this, by sending test e-mails for the mdadm volumes.

dadm --monitor --scan --test

Increase performance

On RAID5+6 arrays only, increasing the stripe size can be an significiant improvement, which I would hightly recommend to do.
You can try with other numbers, but this works very well for me.

echo 16384 > /sys/block/md0/md/stripe_cache_size

Note, you have to add it to a startup script, or create a new one, copy it to /etc/init.d/ and activate it with insserv, so it runs at every boot.

References and links:

https://raid.wiki.kernel.org/index.php/Linux_Raid

Replacing a failed drive

This will also work with encrypted MDADM using LUKS, and LVM.
DISK -> MD -> LUKS -> (LVM) -> FS

Start by location the dead disk. (in this example is /dev/sda)
It could also be smartctl/smartd which has reported the disk is not healthy. Or even MDADM which has reported the array as faulty.

tail -f /var/log/messages |grep sd
Aug 28 18:40:47 server1 kernel: [2141433.351542] sd 0:0:0:0: [sda] Unhandled error code
Aug 28 18:40:47 server1 kernel: [2141433.351549] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Aug 28 18:40:47 server1 kernel: [2141433.351557] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
Aug 28 18:40:47 server1 kernel: [2141433.351678] sd 0:0:0:0: [sda] Unhandled error code
Aug 28 18:40:47 server1 kernel: [2141433.351683] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Aug 28 18:40:47 server1 kernel: [2141433.351690] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 3a 38 60 28 00 00 08 00
Aug 28 18:40:47 server1 kernel: [2141433.351810] sd 0:0:0:0: [sda] Unhandled error code
Aug 28 18:40:47 server1 kernel: [2141433.351815] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Aug 28 18:40:47 server1 kernel: [2141433.351822] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00

Example of faulty MDADM array, look for "(F)" and/or the sda disk.

cat /proc/mdstat 
Personalities : [raid1] 
md3 : active raid1 sda4[0](F) sdb4[1]
      9770936 blocks super 1.2 [2/1] [_U]
      
md2 : active raid1 sda3[0](F) sdb3[1]
      460057464 blocks super 1.2 [2/1] [_U]
      
md1 : active (auto-read-only) raid1 sda2[0] sdb2[1]
      3905524 blocks super 1.2 [2/2] [UU]
      
md0 : active raid1 sda1[0](F) sdb1[1]
      14646200 blocks super 1.2 [2/1] [_U]
      
unused devices:

If in doubt which drive is which, locate the serial number, and later on remove the disk labeled which this serial number.

hdparm -I /dev/sdb|grep -i serial
Serial Number:      WD-WCAPW7666349
hdparm -I /dev/sdb|grep -i serial
Serial Number:      WD-WCAPW7216370

Mark /dev/sda as faulty, and then remove the drive from the md devices.

mdadm /dev/md0 --fail /dev/sda1 
mdadm: set /dev/sda1 faulty in /dev/md0
mdadm /dev/md1 --fail /dev/sda2
mdadm: set /dev/sda2 faulty in /dev/md1
mdadm /dev/md2 --fail /dev/sda3
mdadm: set /dev/sda3 faulty in /dev/md2
mdadm /dev/md3 --fail /dev/sda4
mdadm: set /dev/sda4 faulty in /dev/md3

Remove the disk from the md devices

mdadm /dev/md0 --remove /dev/sda1
mdadm: hot removed /dev/sda1 from /dev/md0
mdadm /dev/md1 --remove /dev/sda2
mdadm: hot removed /dev/sda2 from /dev/md1
mdadm /dev/md2 --remove /dev/sda3
mdadm: hot removed /dev/sda3 from /dev/md2
mdadm /dev/md3 --remove /dev/sda4
mdadm: hot removed /dev/sda4 from /dev/md3

Now replace the disk. If it's not hot-swap able, then shutdown the machine first. Dump partition table on the original disk, and write it to the other disk!
You can also just backup the partition table of the original disk, and import it afterwards.
In this example /dev/sda is the disk we have replaced, and where partition layout needs to be added.

sfdisk -d /dev/sdb | sfdisk /dev/sda

Another approach is simply to use fdisk.

fdisk -l /dev/sdb

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a2320

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048    29296639    14647296   fd  Linux raid autodetect
/dev/sdb2        29296640    37109759     3906560   fd  Linux raid autodetect
/dev/sdb3        37109760   957227007   460058624   fd  Linux raid autodetect
/dev/sdb4       957227008   976771071     9772032   fd  Linux raid autodetect

Create the partition layout on the replaced disk. I am using the "Start" and "End" sectors, when creating the disk-layout so I am certain the it's the same.

fdisk /dev/sda

Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): p
Partition number (1-4, default 1): 
Using default value 1
First sector (2048-976773167, default 2048): 
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-976773167, default 976773167): 29296639

Command (m for help): n
Partition type:
   p   primary (1 primary, 0 extended, 3 free)
   e   extended
Select (default p): p
Partition number (1-4, default 2): 
Using default value 2
First sector (29296640-976773167, default 29296640): 
Using default value 29296640
Last sector, +sectors or +size{K,M,G} (29296640-976773167, default 976773167): 37109759

Command (m for help): n
Partition type:
   p   primary (2 primary, 0 extended, 2 free)
   e   extended
Select (default p):  
Using default response p
Partition number (1-4, default 3): 
Using default value 3
First sector (37109760-976773167, default 37109760): 
Using default value 37109760
Last sector, +sectors or +size{K,M,G} (37109760-976773167, default 976773167): 957227007

Command (m for help): n
Partition type:
   p   primary (3 primary, 0 extended, 1 free)
   e   extended
Select (default e): p
Selected partition 4
First sector (957227008-976773167, default 957227008): 
Using default value 957227008
Last sector, +sectors or +size{K,M,G} (957227008-976773167, default 976773167): 976771071

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Now we are ready to add the partitions to the md devices.

mdadm /dev/md0 --add /dev/sda1
mdadm /dev/md1 --add /dev/sda2
mdadm /dev/md2 --add /dev/sda3
mdadm /dev/md3 --add /dev/sda4

If the disk is part of an bootable OS/partition/array/disk, then we need to install grub in the MBR on the new replcaced disk. If this is not done, and the other disk fails, theres not MBR to load, and it will not boot.

grub-install /dev/sda

If grub install is run while its in degraded mode you will get this warning, NOT error.

grub-install: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image.

You can ignore it, or do it again after the rebuild is complete.

Rebulid could take a long time, depending on partition size

Known issues and fixes

md array inactive

cat /proc/mdstat
Personalities : [raid1]
md1 : inactive sdb2[1](S)
3905536 blocks super 1.2

Stop the array (if possible)

mdadm --stop /dev/md1 
mdadm: stopped /dev/md1

Assemble the array when stopped

mdadm -A /dev/md1 /dev/sdb2
mdadm: /dev/md1 assembled from 1 drive - need all 2 to start it (use --run to insist).

Force start

mdadm -A /dev/md1 /dev/sdb2 --run
mdadm: /dev/md1 has been started with 1 drive (out of 2)

Lastly add the secondary disk

mdadm /dev/md1 --add /dev/sda2
mdadm: added /dev/sda2

mdadm: Cannot open /dev/sda1: Device or resource busy

Check what's using sda1

cat /proc/mdstat
md0 : active raid1 sdb1[2]
      14646200 blocks super 1.2 [2/1] [U_]
      
md2 : active raid1 sdb3[2]
      460057464 blocks super 1.2 [2/1] [U_]
      
md1 : active (auto-read-only) raid1 sdb2[2]
      3905524 blocks super 1.2 [2/1] [U_]
      
md3 : active raid1 sdb4[2]
      9770936 blocks super 1.2 [2/1] [U_]
      
md127 : inactive sda[0](S)
      488385560 blocks super 1.2

Stopping md arrays

Then stop the md array.
Warning Be careful is this its actively in use and/or part of the OS itself.

mdadm --stop /dev/md127

Now it should be possible to add the disks

mdadm /dev/md0 --add /dev/sda1 
mdadm: added /dev/sda1
mdadm /dev/md1 --add /dev/sda2
mdadm: added /dev/sda2
mdadm /dev/md2 --add /dev/sda3
mdadm: added /dev/sda3
mdadm /dev/md3 --add /dev/sda4
mdadm: added /dev/sda4

Running partprobe

Make sure you did run partprobe after sfdisk!

partprobe

Swapping

If you get the error on a swap device, try to disable swap first.

swapoff -a

then add again..

mdadm /dev/md1 --add /dev/sdd1
swapon -a

LVM

I've seen some cases where an older disk was used, and LVM was present.

pvdisplay /dev/sdd1 
  WARNING: PV /dev/sdd1 in VG VG_XenStorage-34507d32-dcd5-83fa-6a4b-70c21a34f3k8 is using an old PV header, modify the VG to update.
  WARNING: Device /dev/sdd1 has size of 62498816 sectors which is smaller than corresponding PV size of 1465147120 sectors. Was device resized?
  WARNING: One or more devices used as PVs in VG VG_XenStorage-34507d32-dcd5-83fa-6a4b-70c21a34f3k8 have changed sizes.
  --- Physical volume ---
  PV Name               /dev/sdd1
  VG Name               VG_XenStorage-34507d32-dcd5-83fa-6a4b-70c21a34f3k8
  PV Size               <698,64 GiB / not usable <11,87 MiB
  Allocatable           yes 
  PE Size               4,00 MiB
  Total PE              178848
  Free PE               140401
  Allocated PE          38447
  PV UUID               dhMyiN-75V9-PvMo-SgV5-zseJ-id5p-kVBHKB

Then remove VG, (answer yes to everything).

vgremove VG_XenStorage-34507d32-dcd5-83fa-6a4b-70c21a34f3k8

Then try again:

mdadm /dev/md1 --add /dev/sdd1