Jak sprawdzać, czy dyski pracują w macierzy poprawnie, czy może któryś jest uszkodzony?
Jeśli to macierz Serveraid*, to ładujemy moduł mptctl i używamy programu mptstatus
# mpt-status -i 1
ioc0 vol_id 1 type IM, 2 phy, 68 GB, state OPTIMAL, flags ENABLED
ioc0 phy 0 scsi_id 1 IBM-ESXS DTN073C3UCDY10FN S27P, 68 GB, state ONLINE, flags NONE
ioc0 phy 1 scsi_id 5 IBM-ESXS DTN073C3UCDY10FN S29C, 68 GB, state ONLINE, flags NONE
Status „OPTIMAL” mówi nam, że wszystko jest ok.
Jeśli mamy macierz niekompletną i włożymy czysty dysk widzimy, że zaczyna się odbudowywać:
Sep 29 12:48:37 c1 kernel: [50303.570505] mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 id=1
Sep 29 12:48:37 c1 kernel: [50303.570515] mptbase: ioc0:   PhysDisk is now online, out of sync
Sep 29 12:48:37 c1 kernel: [50303.573240] mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 id=1
Sep 29 12:48:37 c1 kernel: [50303.573248] mptbase: ioc0:   PhysDisk is now initializing, out of sync
Sep 29 12:48:37 c1 kernel: [50303.852611] mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 id=1
Sep 29 12:48:37 c1 kernel: [50303.852619] mptbase: ioc0:   PhysDisk is now online, out of sync
Sep 29 12:48:37 c1 kernel: [50303.866114] scsi host2: mptspi: ioc0: Integrated RAID detects new device 1
Sep 29 12:48:37 c1 kernel: [50304.261578] mptbase: ioc0: RAID STATUS CHANGE for VolumeID 1
Sep 29 12:48:37 c1 kernel: [50304.261588] mptbase: ioc0:   volume is now degraded, enabled, resync in progress
Odbudowa trwa jakiś czas i po nim widzimy, że:
Sep 29 23:26:34 c1 kernel: [88580.917523] mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 id=1
Sep 29 23:26:34 c1 kernel: [88580.917533] mptbase: ioc0:   PhysDisk is now online
Sep 29 23:26:34 c1 kernel: [88580.918436] mptbase: ioc0: RAID STATUS CHANGE for VolumeID 1
Sep 29 23:26:34 c1 kernel: [88580.918444] mptbase: ioc0:   volume is now optimal, enabled, resync in progress
Sep 29 23:26:34 c1 kernel: [88580.921791] mptbase: ioc0: RAID STATUS CHANGE for VolumeID 1
Sep 29 23:26:34 c1 kernel: [88580.921804] mptbase: ioc0:   volume is now optimal, enabled
Macierz się odbudowała. Odbudowa RAID1 2*73GB w IBM x345 trwała 10.5 godziny.
Jeśli korzystamy z cciss, korzystając z smatrctl możemy sprawdzić stan poszczególnych dysków:
smartctl -d cciss,0 -a /dev/cciss/c0d0
smartctl -d cciss,1 -a /dev/cciss/c0d0
Device: IBM-ESXS BBD073C3ESTT0ZFN Version: JP86
Serial number: K407DKQK
Device type: disk
Transport protocol: Parallel SCSI (SPI-4)
Local Time is: Thu Sep 30 00:24:47 2010 CEST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK
Current Drive Temperature:     34 C
Current start stop count:      44 times
Elements in grown defect list: 0
Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:     590731       17         0         0          0       4913.778           0
write:         0        0         0         0          0       3478.495           0
Non-medium error count:      187
Last n error events log page
No self-tests have been logged
Long (extended) Self Test duration: 1440 seconds [24.0 minutes]