Thursday, November 21, 2024

Quy trình thay disk trong ZFS ở môi trường Production

-

Đây là trạng thái zpool status bạn thấy sdad đang bị lỗi, thậm chí không nhận diện được đúng tên device name mà thay vào đó là 1 dãy số và nó đang ở trạng thái FAILED.

shell> zpool status
  pool: data
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: resilvered 799G in 3 days 03:18:40 with 0 errors on Thu Jul 25 20:53:36 2024
config:

        NAME                      STATE     READ WRITE CKSUM
        data                      DEGRADED     0     0     0
          raidz2-0                DEGRADED     0     0     0
            sda                   ONLINE       0     0     0
            sdb                   ONLINE       0     0     0
            sdc                   ONLINE       0     0     0

            < đã lược bỏ bớt dòng >

            sdaa                  ONLINE       0     0     0
            sdab                  ONLINE       0     0     0
            sdac                  ONLINE       0     0     0
            17764873689765375348  FAILED      0     0     0  was /dev/sdad1
            sdae                  ONLINE       0     0     0
            sdaf                  ONLINE       0     0     0

            < đã lược bỏ bớt dòng >

            sdbf                  ONLINE       0     0     0
            sdbg                  ONLINE       0     0     0
            sdbh                  ONLINE       0     0     0

errors: No known data errors

Bạn hãy chạy lệnh zpool offline [pool name] [device name] để đặt device này sang mode offline.

zpool offline data 17764873689765375348

Bây giờ bạn thấy device 17764873689765375348 (sdad) đang ở trạng thái OFFLINE.

shell> zpool status
  pool: data
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: resilvered 799G in 3 days 03:18:40 with 0 errors on Thu Jul 25 20:53:36 2024
config:

        NAME                      STATE     READ WRITE CKSUM
        data                      DEGRADED     0     0     0
          raidz2-0                DEGRADED     0     0     0
            sda                   ONLINE       0     0     0
            sdb                   ONLINE       0     0     0
            sdc                   ONLINE       0     0     0

            < đã lược bỏ bớt dòng >

            sdaa                  ONLINE       0     0     0
            sdab                  ONLINE       0     0     0
            sdac                  ONLINE       0     0     0
            17764873689765375348  OFFLINE      0     0     0  was /dev/sdad1
            sdae                  ONLINE       0     0     0
            sdaf                  ONLINE       0     0     0

            < đã lược bỏ bớt dòng >

            sdbf                  ONLINE       0     0     0
            sdbg                  ONLINE       0     0     0
            sdbh                  ONLINE       0     0     0

errors: No known data errors

Sau khi device chuyển sang chế độ OFFLINE bạn hãy dùng lệnh ssacli ctrl all show để xem raid card của bạn đang ở slot nào, trường hợp của mình đang ở slot 1.

shell> ssacli ctrl all show
Smart Array P440 in Slot 1                (sn: PDNMF0ARH140I4)

Bạn hãy kiểm tra trên slot 1 bằng lệnh ssacli ctrl all show config bạn sẽ thấy Array Z với physicaldrive1I:1:26logicaldrive đang failed.

shell> ssacli ctrl slot=1 ld all show 
   Array AD (Failed)
      logicaldrive 30 (3.64 TB, RAID 0, Failed)

Bây giờ bạn có thể dùng lệnh ssacli để xóa logicaldrive với id là 26 này, nhấn y để xác nhận đồng ý xóa logicaldrive.

shell> ssacli ctrl slot=1 ld 30 delete

Warning: Deleting an array can cause other array letters to become renamed.
         E.g. Deleting array A from arrays A,B,C will result in two remaining
         arrays A,B ... not B,C


Warning: Deleting the specified device(s) will result in data being lost.
         Continue? (y/n) y

Giờ nếu bạn grep sdad bạn sẽ không thấy nó nữa.

shell> lsblk | grep ^sdad
< trống >

Tiến hành tạo lại Raid cho device này.

shell> ssacli ctrl slot=1 create type=ld raid=0 drives=1I:1:30
Warning: Creation of this logical drive has caused array letters to become
         renamed.

Giờ đây bạn sẽ thấy device name của 1I:1:30 sẽ xuất hiện khi sử dụng lệnh lsblk.


shell> lsblk | grep sdad
sdad         65:208  0   3.6T  0 disk 

Kết quả khi show physicaldrive với trạng thái OK.

shell> ssacli ctrl slot=1 pd all show 
   Array AD
      physicaldrive 1I:1:30 (port 1I:box 1:bay 30, SATA HDD, 4 TB, OK)

Kết quả tương tự với logicaldrive.

shell> ssacli ctrl slot=1 ld all show 
   Array AD
      logicaldrive 30 (3.64 TB, RAID 0, OK)

Bạn có thể sử dụng tham số show detail để xem thông tin chi tiết hơn về device này.

ssacli ctrl slot=1 ld 1I:1:30 show detail

Giờ có thể sử dụng lệnh zpool replace [pool name] [old device name] [new device name] để thay thế disk trong ZFS nhé.

zpool replace data sdad sdad # no output

Bây giờ nếu bạn sử dụng zpool status bạn sẽ nhìn thấy tiến trình resilver của ZFS sẽ bắt đầu.

shell> root@SOC-CEPH-INFRAS-BACKUP-06:~# zpool status
  pool: data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Jul 26 10:06:33 2024
        86.3G scanned at 3.20G/s, 43.1G issued at 1.60G/s, 76.5T total
        0B resilvered, 0.05% done, 13:38:05 to go
config:

        NAME                        STATE     READ WRITE CKSUM
        data                        DEGRADED     0     0     0
          raidz2-0                  DEGRADED     0     0     0
            sda                     ONLINE       0     0     0
            sdb                     ONLINE       0     0     0
            sdc                     ONLINE       0     0     0

            < đã lược bỏ bớt dòng >

            sdy                     ONLINE       0     0     0
            sdz                     ONLINE       0     0     0
            sdaa                    ONLINE       0     0     0
            sdab                    ONLINE       0     0     0
            sdac                    ONLINE       0     0     0
            replacing-29            DEGRADED     0     0     0
              17764873689765375348  OFFLINE      0     0     0  was /dev/sdad1/old
              sdad                  ONLINE       0     0     0
            sdae                    ONLINE       0     0     0
            sdaf                    ONLINE       0     0     0

            < đã lược bỏ bớt dòng >

            sdbg                    ONLINE       0     0     0
            sdbh                    ONLINE       0     0     0

errors: No known data errors

Bạn hãy cài sysstat để check xem disk mới thay có resilver không nhé, bằng cách xem có iop trong device này hay không.

apt install sysstat -y

Và dưới đây là kết quả.

shell> iostat -dx 1 /dev/sdad
Linux 5.15.0-43-generic (SOC-CEPH-INFRAS-BACKUP-06)     07/26/2024      _x86_64_        (40 CPU)

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdad             0.00      0.02     0.00   0.00    1.57    24.02    0.08      0.79     0.00   0.31   13.91     9.72    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.02


Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdad             0.00      0.00     0.00   0.00    0.00     0.00  101.00    444.00     0.00   0.00   13.61     4.40    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.38  31.60


Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdad             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdad             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdad             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdad             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdad             0.00      0.00     0.00   0.00    0.00     0.00  258.00   2248.00     2.00   0.77   19.90     8.71    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    5.13  76.00


Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdad             0.00      0.00     0.00   0.00    0.00     0.00  497.00   2664.00     0.00   0.00   13.56     5.36    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    6.74  94.80


Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdad             0.00      0.00     0.00   0.00    0.00     0.00  488.12   3920.79     0.00   0.00   14.94     8.03    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    7.29  96.24


Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdad             0.00      0.00     0.00   0.00    0.00     0.00  529.00   5760.00     2.00   0.38   14.88    10.89    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    7.87  94.40


Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdad             0.00      0.00     0.00   0.00    0.00     0.00  484.00   7188.00     1.00   0.21   17.78    14.85    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    8.61  97.20


Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdad             0.00      0.00     0.00   0.00    0.00     0.00  487.00   6896.00     2.00   0.41   16.97    14.16    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    8.26  98.00


Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdad             0.00      0.00     0.00   0.00    0.00     0.00  433.00   5712.00     2.00   0.46   17.68    13.19    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    7.65  93.20


Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdad             0.00      0.00     0.00   0.00    0.00     0.00  426.00   5420.00     6.00   1.39   18.48    12.72    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    7.87  94.00


Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdad             0.00      0.00     0.00   0.00    0.00     0.00  617.00   6560.00     5.00   0.80   13.83    10.63    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    8.53  97.20

LEAVE A REPLY

Please enter your comment!
Please enter your name here

4,956FansLike
256FollowersFollow
223SubscribersSubscribe
spot_img

Related Stories