Đây là trạng thái zpool status bạn thấy sdad đang bị lỗi, thậm chí không nhận diện được đúng tên device name mà thay vào đó là 1 dãy số và nó đang ở trạng thái FAILED.
shell> zpool status
pool: data
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: resilvered 799G in 3 days 03:18:40 with 0 errors on Thu Jul 25 20:53:36 2024
config:
NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
< đã lược bỏ bớt dòng >
sdaa ONLINE 0 0 0
sdab ONLINE 0 0 0
sdac ONLINE 0 0 0
17764873689765375348 FAILED 0 0 0 was /dev/sdad1
sdae ONLINE 0 0 0
sdaf ONLINE 0 0 0
< đã lược bỏ bớt dòng >
sdbf ONLINE 0 0 0
sdbg ONLINE 0 0 0
sdbh ONLINE 0 0 0
errors: No known data errors
Bạn hãy chạy lệnh zpool offline [pool name] [device name]
để đặt device này sang mode offline.
zpool offline data 17764873689765375348
Bây giờ bạn thấy device 17764873689765375348 (sdad) đang ở trạng thái OFFLINE.
shell> zpool status
pool: data
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: resilvered 799G in 3 days 03:18:40 with 0 errors on Thu Jul 25 20:53:36 2024
config:
NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
< đã lược bỏ bớt dòng >
sdaa ONLINE 0 0 0
sdab ONLINE 0 0 0
sdac ONLINE 0 0 0
17764873689765375348 OFFLINE 0 0 0 was /dev/sdad1
sdae ONLINE 0 0 0
sdaf ONLINE 0 0 0
< đã lược bỏ bớt dòng >
sdbf ONLINE 0 0 0
sdbg ONLINE 0 0 0
sdbh ONLINE 0 0 0
errors: No known data errors
Sau khi device chuyển sang chế độ OFFLINE bạn hãy dùng lệnh ssacli ctrl all show để xem raid card của bạn đang ở slot nào, trường hợp của mình đang ở slot 1.
shell> ssacli ctrl all show
Smart Array P440 in Slot 1 (sn: PDNMF0ARH140I4)
Bạn hãy kiểm tra trên slot 1 bằng lệnh ssacli ctrl all show config
bạn sẽ thấy Array Z
với physicaldrive
là 1I:1:26
có logicaldrive
đang failed
.
shell> ssacli ctrl slot=1 ld all show
Array AD (Failed)
logicaldrive 30 (3.64 TB, RAID 0, Failed)
Bây giờ bạn có thể dùng lệnh ssacli
để xóa logicaldrive
với id là 26 này, nhấn y để xác nhận đồng ý xóa logicaldrive
.
shell> ssacli ctrl slot=1 ld 30 delete
Warning: Deleting an array can cause other array letters to become renamed.
E.g. Deleting array A from arrays A,B,C will result in two remaining
arrays A,B ... not B,C
Warning: Deleting the specified device(s) will result in data being lost.
Continue? (y/n) y
Giờ nếu bạn grep sdad bạn sẽ không thấy nó nữa.
shell> lsblk | grep ^sdad
< trống >
Tiến hành tạo lại Raid cho device này.
shell> ssacli ctrl slot=1 create type=ld raid=0 drives=1I:1:30
Warning: Creation of this logical drive has caused array letters to become
renamed.
Giờ đây bạn sẽ thấy device name của 1I:1:30 sẽ xuất hiện khi sử dụng lệnh lsblk.
shell> lsblk | grep sdad
sdad 65:208 0 3.6T 0 disk
Kết quả khi show physicaldrive với trạng thái OK.
shell> ssacli ctrl slot=1 pd all show
Array AD
physicaldrive 1I:1:30 (port 1I:box 1:bay 30, SATA HDD, 4 TB, OK)
Kết quả tương tự với logicaldrive.
shell> ssacli ctrl slot=1 ld all show
Array AD
logicaldrive 30 (3.64 TB, RAID 0, OK)
Bạn có thể sử dụng tham số show detail để xem thông tin chi tiết hơn về device này.
ssacli ctrl slot=1 ld 1I:1:30 show detail
Giờ có thể sử dụng lệnh zpool replace [pool name] [old device name] [new device name] để thay thế disk trong ZFS nhé.
zpool replace data sdad sdad # no output
Bây giờ nếu bạn sử dụng zpool status bạn sẽ nhìn thấy tiến trình resilver của ZFS sẽ bắt đầu.
shell> root@SOC-CEPH-INFRAS-BACKUP-06:~# zpool status
pool: data
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Jul 26 10:06:33 2024
86.3G scanned at 3.20G/s, 43.1G issued at 1.60G/s, 76.5T total
0B resilvered, 0.05% done, 13:38:05 to go
config:
NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
< đã lược bỏ bớt dòng >
sdy ONLINE 0 0 0
sdz ONLINE 0 0 0
sdaa ONLINE 0 0 0
sdab ONLINE 0 0 0
sdac ONLINE 0 0 0
replacing-29 DEGRADED 0 0 0
17764873689765375348 OFFLINE 0 0 0 was /dev/sdad1/old
sdad ONLINE 0 0 0
sdae ONLINE 0 0 0
sdaf ONLINE 0 0 0
< đã lược bỏ bớt dòng >
sdbg ONLINE 0 0 0
sdbh ONLINE 0 0 0
errors: No known data errors
Bạn hãy cài sysstat để check xem disk mới thay có resilver không nhé, bằng cách xem có iop trong device này hay không.
apt install sysstat -y
Và dưới đây là kết quả.
shell> iostat -dx 1 /dev/sdad
Linux 5.15.0-43-generic (SOC-CEPH-INFRAS-BACKUP-06) 07/26/2024 _x86_64_ (40 CPU)
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sdad 0.00 0.02 0.00 0.00 1.57 24.02 0.08 0.79 0.00 0.31 13.91 9.72 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sdad 0.00 0.00 0.00 0.00 0.00 0.00 101.00 444.00 0.00 0.00 13.61 4.40 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.38 31.60
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sdad 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sdad 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sdad 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sdad 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sdad 0.00 0.00 0.00 0.00 0.00 0.00 258.00 2248.00 2.00 0.77 19.90 8.71 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 5.13 76.00
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sdad 0.00 0.00 0.00 0.00 0.00 0.00 497.00 2664.00 0.00 0.00 13.56 5.36 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 6.74 94.80
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sdad 0.00 0.00 0.00 0.00 0.00 0.00 488.12 3920.79 0.00 0.00 14.94 8.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7.29 96.24
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sdad 0.00 0.00 0.00 0.00 0.00 0.00 529.00 5760.00 2.00 0.38 14.88 10.89 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7.87 94.40
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sdad 0.00 0.00 0.00 0.00 0.00 0.00 484.00 7188.00 1.00 0.21 17.78 14.85 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8.61 97.20
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sdad 0.00 0.00 0.00 0.00 0.00 0.00 487.00 6896.00 2.00 0.41 16.97 14.16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8.26 98.00
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sdad 0.00 0.00 0.00 0.00 0.00 0.00 433.00 5712.00 2.00 0.46 17.68 13.19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7.65 93.20
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sdad 0.00 0.00 0.00 0.00 0.00 0.00 426.00 5420.00 6.00 1.39 18.48 12.72 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7.87 94.00
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sdad 0.00 0.00 0.00 0.00 0.00 0.00 617.00 6560.00 5.00 0.80 13.83 10.63 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8.53 97.20