Saturday, December 23, 2023

NAS hard drive is failing

Bad news from the nas:

Category: Internal storage
Event: Drive 2 in DS718+ is failing
Time: 2023-12-23 13:40
Description: Drive 2 in DS718+ is severely damaged and is failing. Please back up your data immediately and then replace the drive.
Drive information:
Brand: WDC
Model: WD80EFZX-68UW8N0
Capacity: 7.3 TB
Serial number: -----
Firmware: 83.H0A83
S.M.A.R.T. Status: Failing
Bad sector count: 27
Drive reconnection count: 0
Drive re-identification count: 0
Please log in to nas for more information.

Was 25 bad sectors a couple weeks ago. No need to panic because there's another backup of the nas. I should write about that sometime.

I want to increase the capacity and upgrade to DSM 7, so here's the plan:

Buy 2 new drives and setup the DS718+ as if brand new.

Restore the configuration.

Mount the good drive to an external HDD bay and copy stuff to the "new" nas. The bay  I have can do eSata, so copy times should be ok.

To that end, ordered two 12TB Seagate Iron Wolf from Amazon. Picked these because they were on sale. I think a 50% bump in capacity will do for now.

The nas storage pool is configured for RAID 1, so I probably could have simply replaced the failing drive with a new 8TB unit.

27 Dec 2023

The 12TB HDDs arrived. Would like to mount them in an old atx tower, but finding it tricky because only two of the screw holes line up with the carrier. Western Digital has a document that explains what I mean in the following sentence. The carrier is designed for drives that have mounting holes at A7 and A6, but the drive has them at A7 and A13 only. My old drives had all three (A7, A6, A13). The new drives will only be mounted for a short while for testing and the case will not be moved, so just two screws will do for now.

Useful commands. The server is running EndeavourOS which is based on Arch.

# Find out if smart is supported and enabled:
smartctl --info /dev/sdb | grep 'SMART support is'
# Enable smart:
smartctl --smart=on /dev/sdb
# See if the drive was damaged in transit (about 2 minutes):
smartctl -t conveyance /dev/device
# See results of the test:
smartctl -H /dev/device
# Short test (about 1 minute)
smartctl -t short /dev/device
# See results of recent tests and info about the device:
smartctl -x /dev/device
# Long test (about 1051 minutes, or 17.5 hours):
smartctl -t long /dev/device
 
Example smartctl -x output:

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Self-test routine in progress 60%         7         -
# 2  Short offline       Completed without error       00%         1         -
# 3  Conveyance offline  Completed without error       00%         1         -

 

References

Synology Knowledge Center - replace a drive

Synology Knowledge Center - max disk size

DSM 718+ installation guide

Synology Knowledge Center - backup and restore configuration

https://wiki.archlinux.org/title/S.M.A.R.T.

No comments:

Post a Comment