Clearing UNC Hard Disk Errors

One of the advantages of running a Linux RAID configuration is that it simplifies clearing of UNC errors on your hard disks. If you have such a setup, you may follow the process below to clear the UNC errors from your disk and extends it's life a bit longer. The example here is for a RAID 1 configuration, but any RAID level will do (besides RAID 0).

Determine the disk (X) with the errors using smartctl

smartctl -A /dev/sda
smartctl -A /dev/sdb

Determine the LBA where the first error was found

smartctl -a /dev/sdX

Determine the partition (Y) that contains the LBA

sfdisk -l -uS /dev/sdX

Do not continue this process if there are excessive errors (80 or more).  Replace the disk immediately.

Determine the RAID array (Z) that the partition belongs to

cat /proc/mdstat

Fail and remove the partition from the RAID

mdadm /dev/mdZ --fail /dev/sdXY
mdadm /dev/mdZ --remove /dev/sdXY

You must zero the superblocks on the partition to allow a proper remirror

mdadm --zero-superblock /dev/sdXY

Re-add the partition to the RAID to initiate the remirror

mdadm /dev/mdZ --add /dev/sdXY

Monitor the remirror progress.  When complete, review smartctl to see if the errors are gone.

  • If errors are still there, confirm you have been working with the correct partition
  • If all errors cannot be cleared, replace the disk.

When all errors are cleared, run a long SMART test to confirm disk is healthy.

smartctl -t long /dev/sdX

It should complete without any read errors.

  • If more errors are found.  Repeat the process above.
  • Do not repeat the process any more than 2 times.  The drive is unhealthy at this point and should be replaced.
  • Do not repeat the process if there are excessive errors (80 or more).