RAID-Z1本番機HDD交換

FreeBSDサーバ、こんなエラーを吐くようになって。

(ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 80 10 be 14 40 09 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada0:ahcich0:0:0:0): RES: 41 40 40 be 14 00 09 00 00 80 00
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 80 10 be 14 40 09 00 00 00 00 00

zpoolもCKSUMエラーが出た。

        NAME           STATE     READ WRITE CKSUM
        zvm0           ONLINE       0     0     0
          raidz1-0     ONLINE       0     0     0
            gpt/pool0  ONLINE       0     0    20
            gpt/pool1  ONLINE       0     0     0
            gpt/pool2  ONLINE       0     0     0

諦めて交換するか。練習したとき はVirtualBoxテスト環境だったが、今回はバリバリの実運用機なので慎重に。 まず元のHDDのGPTを見ておく。

gpart show ada0
=>        40  7814037088  ada0  GPT  (3.6T)
          40        1024     1  freebsd-boot  (512K)
        1064    16777216     2  freebsd-ufs  (8.0G)
    16778280    16777216     3  freebsd-swap  (8.0G)
    33555496    16777216     4  freebsd-ufs  (8.0G)
    50332712    67108864     5  freebsd-ufs  (32G)
   117441576  7696595544     6  freebsd-zfs  (3.6T)
  7814037120           8        - free -  (4.0K)
  1. 外付けでHDD追加ののち、GPTスキーム作成
    gpart create -s gpt da0
    gpart add -t freebsd-boot -s 512k -l boot3 da0
    da0p1 added
    gpart add -t freebsd-ufs -s 8g -l root3 da0
    da0p2 added
    gpart add -t freebsd-swap -s 8g -l swap3 da0
    da0p3 added
    gpart add -t freebsd-ufs -s 8g -l var3 da0
    da0p4 added
    gpart add -t freebsd-ufs -s 32g -l usr3 da0
    da0p5 added
    gpart add -t freebsd-zfs -l pool3 da0
    da0p6 added
    gpart show da0
    =>        40  7814037088  da0  GPT  (3.6T)
              40        1024    1  freebsd-boot  (512K)
            1064    16777216    2  freebsd-ufs  (8.0G)
        16778280    16777216    3  freebsd-swap  (8.0G)
        33555496    16777216    4  freebsd-ufs  (8.0G)
        50332712    67108864    5  freebsd-ufs  (32G)
       117441576  7696595552    6  freebsd-zfs  (3.6T)
    
  2. bootcode書き込み
    gpart bootcode -b /boot/pmbr -p /boot/gptboot -i 1 da0
    

    起動パーティションはzfsでなくufsなので gptboot を使う。

  3. 先に gmirror プロバイダの入れ換え
    gmiror status
           Name    Status  Components
    mirror/root  COMPLETE  ada2p2 (ACTIVE)
                           ada4p2 (ACTIVE)
                           ada0p2 (ACTIVE)
     mirror/var  COMPLETE  ada2p4 (ACTIVE)
                           ada4p4 (ACTIVE)
                           ada0p4 (ACTIVE)
     mirror/usr  COMPLETE  ada2p5 (ACTIVE)
                           ada4p5 (ACTIVE)
                           ada0p5 (ACTIVE)
    mirror/ssd0  COMPLETE  ada1p1 (ACTIVE)
                           ada3p1 (ACTIVE)
    

    今作った da0p{2,4,5} を入れて、対応する ada0* を抜く。

    gmirror insert root da0p2
    gmirror remove root ada0p2
    gmirror insert var da0p4
    gmirror remove var ada0p4
    gmirror insert usr da0p5
    gmirror remove usr ada0p5
    
  4. はい、zfs ね。祈りながらreplace。
    zpool replace zvm0 gpt/pool0 gpt/pool3
    

    えぃっ、ままよ!

    zpool status
      pool: zvm0
     state: ONLINE
    status: One or more devices is currently being resilvered.  The pool will
            continue to function, possibly in a degraded state.
    action: Wait for the resilver to complete.
      scan: resilver in progress since Thu Sep 19 19:15:55 2019
            2.88T scanned at 3.86G/s, 17.7G issued at 97.7M/s, 2.88T total
            5.86G resilvered, 0.60% done, 0 days 08:32:17 to go
    config: 
    
            NAME             STATE     READ WRITE CKSUM
            zvm0             ONLINE       0     0     0
              raidz1-0       ONLINE       0     0     0
                replacing-0  ONLINE       0     0     0
                  gpt/pool0  ONLINE       0     0    20
                  gpt/pool3  ONLINE       0     0     0
                gpt/pool1    ONLINE       0     0     0
                gpt/pool2    ONLINE       0     0     0
    
    errors: No known data errors
    

    おお、進んでる進んでる。

翌朝確認、resilver 無事完了。

電源を落とし、外付けでつないだHDDを筐体内のものと交換し ハードウェア的にもリプレースした。手順的に全て完了。

  pool: zvm0
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: resilvered 981G in 0 days 10:17:32 with 0 errors on Fri Sep 20 05:33:27 2019
config: 

        NAME           STATE     READ WRITE CKSUM
        zvm0           ONLINE       0     0     0
          raidz1-0     ONLINE       0     0     0
            gpt/pool3  ONLINE       0     0     0
            gpt/pool1  ONLINE       0     0     0
            gpt/pool2  ONLINE       0     0     0

errors: No known data errors

ついでに zpool upgrade もしておいた。

zpool upgrade zvm0
This system supports ZFS pool feature flags.

Enabled the following features on 'zvm0':
  device_removal
  obsolete_counts
  zpool_checkpoint
  spacemap_v2

ん? device_removal? これ間違ってaddしちゃったのを外せるの? あとで調べよう。