Jak sprawdzać, czy dyski pracują w macierzy poprawnie, czy może któryś jest uszkodzony?
Jeśli to macierz Serveraid*, to ładujemy moduł mptctl i używamy programu mptstatus
# mpt-status -i 1
ioc0 vol_id 1 type IM, 2 phy, 68 GB, state OPTIMAL, flags ENABLED
ioc0 phy 0 scsi_id 1 IBM-ESXS DTN073C3UCDY10FN S27P, 68 GB, state ONLINE, flags NONE
ioc0 phy 1 scsi_id 5 IBM-ESXS DTN073C3UCDY10FN S29C, 68 GB, state ONLINE, flags NONE
Status „OPTIMAL” mówi nam, że wszystko jest ok.
Jeśli mamy macierz niekompletną i włożymy czysty dysk widzimy, że zaczyna się odbudowywać:
Sep 29 12:48:37 c1 kernel: [50303.570505] mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 id=1
Sep 29 12:48:37 c1 kernel: [50303.570515] mptbase: ioc0: PhysDisk is now online, out of sync
Sep 29 12:48:37 c1 kernel: [50303.573240] mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 id=1
Sep 29 12:48:37 c1 kernel: [50303.573248] mptbase: ioc0: PhysDisk is now initializing, out of sync
Sep 29 12:48:37 c1 kernel: [50303.852611] mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 id=1
Sep 29 12:48:37 c1 kernel: [50303.852619] mptbase: ioc0: PhysDisk is now online, out of sync
Sep 29 12:48:37 c1 kernel: [50303.866114] scsi host2: mptspi: ioc0: Integrated RAID detects new device 1
Sep 29 12:48:37 c1 kernel: [50304.261578] mptbase: ioc0: RAID STATUS CHANGE for VolumeID 1
Sep 29 12:48:37 c1 kernel: [50304.261588] mptbase: ioc0: volume is now degraded, enabled, resync in progress
Odbudowa trwa jakiś czas i po nim widzimy, że:
Sep 29 23:26:34 c1 kernel: [88580.917523] mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 id=1
Sep 29 23:26:34 c1 kernel: [88580.917533] mptbase: ioc0: PhysDisk is now online
Sep 29 23:26:34 c1 kernel: [88580.918436] mptbase: ioc0: RAID STATUS CHANGE for VolumeID 1
Sep 29 23:26:34 c1 kernel: [88580.918444] mptbase: ioc0: volume is now optimal, enabled, resync in progress
Sep 29 23:26:34 c1 kernel: [88580.921791] mptbase: ioc0: RAID STATUS CHANGE for VolumeID 1
Sep 29 23:26:34 c1 kernel: [88580.921804] mptbase: ioc0: volume is now optimal, enabled
Macierz się odbudowała. Odbudowa RAID1 2*73GB w IBM x345 trwała 10.5 godziny.
Jeśli korzystamy z cciss, korzystając z smatrctl możemy sprawdzić stan poszczególnych dysków:
smartctl -d cciss,0 -a /dev/cciss/c0d0
smartctl -d cciss,1 -a /dev/cciss/c0d0
Device: IBM-ESXS BBD073C3ESTT0ZFN Version: JP86
Serial number: K407DKQK
Device type: disk
Transport protocol: Parallel SCSI (SPI-4)
Local Time is: Thu Sep 30 00:24:47 2010 CEST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK
Current Drive Temperature: 34 C
Current start stop count: 44 times
Elements in grown defect list: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 590731 17 0 0 0 4913.778 0
write: 0 0 0 0 0 3478.495 0
Non-medium error count: 187
Last n error events log page
No self-tests have been logged
Long (extended) Self Test duration: 1440 seconds [24.0 minutes]