Proxmox VE Hyper-Converged Ceph Cluster upgrade 6.4 to 7.0

前回の Proxmox VE 6.4 Ceph upgrade Nautilus to Octopus に引き続き、Proxmox VE 6.4を7.0にアップグレードします。手順は Upgrade from 6.x to 7.0 に沿って行います。問題もなく、スムーズにアップグレードは完了したので、一安心です。

作業構成

事前に apt-get update && apt-get dist-upgrade を済ませてあるので、以下のバージョンとなります。

Proxmox VE: 6.4-13
Ceph: Octopus(15.2.13)

いずれも3台構成のクラスタ環境となります。

pve6to7 コマンドによる事前確認

まずは pve6to7 コマンドを使用して、問題のありそうな部分をチェックします。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94


root@pve01:~# pve6to7 --full
= CHECKING VERSION INFORMATION FOR PVE PACKAGES =

Checking for package updates..
PASS: all packages uptodate

Checking proxmox-ve package version..
PASS: proxmox-ve package has version >= 6.4-1

Checking running kernel version..
PASS: expected running kernel '5.4.128-1-pve'.

= CHECKING CLUSTER HEALTH/SETTINGS =

PASS: systemd unit 'pve-cluster.service' is in state 'active'
PASS: systemd unit 'corosync.service' is in state 'active'
PASS: Cluster Filesystem is quorate.

Analzying quorum settings and state..
INFO: configured votes - nodes: 3
INFO: configured votes - qdevice: 0
INFO: current expected votes: 3
INFO: current total votes: 3

Checking nodelist entries..
WARN: pve01: ring0_addr 'pve01' resolves to '192.168.122.26'.
 Consider replacing it with the currently resolved IP address.
WARN: pve02: ring0_addr 'pve02' resolves to '192.168.122.27'.
 Consider replacing it with the currently resolved IP address.

Checking totem settings..
PASS: totem settings OK

INFO: run 'pvecm status' to get detailed cluster status..

= CHECKING HYPER-CONVERGED CEPH STATUS =

INFO: hyper-converged ceph setup detected!
INFO: getting Ceph status/health information..
PASS: Ceph health reported as 'HEALTH_OK'.
INFO: getting Ceph daemon versions..
PASS: single running version detected for daemon type monitor.
PASS: single running version detected for daemon type manager.
PASS: single running version detected for daemon type MDS.
PASS: single running version detected for daemon type OSD.
PASS: single running overall version detected for all Ceph daemon types.
WARN: 'noout' flag not set - recommended to prevent rebalancing during cluster-wide upgrades.
INFO: checking Ceph config..

= CHECKING CONFIGURED STORAGES =

PASS: storage 'cephfs' enabled and active.
PASS: storage 'local' enabled and active.
PASS: storage 'local-lvm' enabled and active.
PASS: storage 'rdb_ct' enabled and active.
PASS: storage 'rdb_vm' enabled and active.
PASS: storage 'www' enabled and active.

= MISCELLANEOUS CHECKS =

INFO: Checking common daemon services..
PASS: systemd unit 'pveproxy.service' is in state 'active'
PASS: systemd unit 'pvedaemon.service' is in state 'active'
PASS: systemd unit 'pvestatd.service' is in state 'active'
INFO: Checking for running guests..
WARN: 8 running guest(s) detected - consider migrating or stopping them.
INFO: Checking if the local node's hostname 'pve01' is resolvable..
INFO: Checking if resolved IP is configured on local node..
PASS: Resolved node IP '192.168.122.26' configured and active on single interface.
INFO: Checking backup retention settings..
WARN: storage 'www' - parameter 'maxfiles' is deprecated with PVE 7.x and will be removed in a future version, use 'prune-backups' instead.
INFO: storage 'local' - no backup retention settings defined - by default, PVE 7.x will no longer keep only the last backup, but all backups
INFO: checking CIFS credential location..
WARN: CIFS credentials '/etc/pve/priv/www.cred' will be moved to '/etc/pve/priv/storage/www.pw' during the update
INFO: Checking custom roles for pool permissions..
INFO: Checking node and guest description/note legnth..
PASS: All node config descriptions fit in the new limit of 64 KiB
PASS: All guest config descriptions fit in the new limit of 8 KiB
INFO: Checking container configs for deprecated lxc.cgroup entries
PASS: No legacy 'lxc.cgroup' keys found.
INFO: Checking storage content type configuration..
PASS: no problems found
INFO: Checking if the suite for the Debian security repository is correct..
INFO: Make sure to change the suite of the Debian security repository from 'buster/updates' to 'bullseye-security' - in /etc/apt/sources.list:5

= SUMMARY =

TOTAL:    33
PASSED:   27
SKIPPED:  0
WARNINGS: 6
FAILURES: 0

ATTENTION: Please check the output for detailed information!

WARNINGが6件ありますね。それぞれ確認します。

pve6to7 の結果確認

WARN: pve01: ring0_addr ‘pve01’ resolves to ‘192.168.122.26’.

これはcorosyncの ringX_addr がホスト名で書かれていることに起因する警告です。
/etc/pve/corosync.conf を確認すると、以下のように書かれていました。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


nodelist {
  node {
    name: pve01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: pve01
  }
  node {
    name: pve02
    nodeid: 2
    quorum_votes: 1
    ring0_addr: pve02
  }
  node {
    name: pve03
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.122.28
  }
}

確かに ring0_addr がホスト名で書かれています。アドレス表記とホスト名表記が混在しているのも気持ち悪いので、この機会にアドレス表記に揃えます。

corosync.conf の修正

Proxmox VEでクラスタ構成を組むと /etc/pve 配下はファイルの同期が行われます。不用意に /etc/pve/corosync.conf を上書きすると、クラスタを破壊してしまう可能性があるので注意が必要です。

Proxmox VEのwiki Edit corosync.conf を見ながら作業しましょう。

バックアップファイルと、編集用のファイルをコピーします。

1
2
3
4
5
6
7


root@pve01:~# ls -ls /etc/pve/corosync.conf 
1 -rw-r----- 1 root www-data 538 Feb 11  2019 /etc/pve/corosync.conf
root@pve01:~# cd
root@pve01:~# pwd
/root
root@pve01:~# cp -p /etc/pve/corosync.conf ./corosync.conf.bak
root@pve01:~# cp -p /etc/pve/corosync.conf ./corosync.conf.new

Info

公式の手順とはコマンドが少し違いますが /etc/pve/corosync.conf のパーミッションが root:www-data だったので、とりあえず属性もコピーしています。

./corosync.conf.new を以下のように変更します。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


root@pve01:~# diff -u /etc/pve/corosync.conf corosync.conf.new 
--- /etc/pve/corosync.conf      2019-02-11 19:19:27.000000000 +0900
+++ corosync.conf.new   2021-08-12 18:47:46.844239590 +0900
@@ -8,13 +8,13 @@
     name: pve01
     nodeid: 1
     quorum_votes: 1
-    ring0_addr: pve01
+    ring0_addr: 192.168.122.26
   }
   node {
     name: pve02
     nodeid: 2
     quorum_votes: 1
-    ring0_addr: pve02
+    ring0_addr: 192.168.122.27
   }
   node {
     name: pve03
@@ -30,7 +30,7 @@
 
 totem {
   cluster_name: pve-cluster
-  config_version: 5
+  config_version: 6
   interface {
     bindnetaddr: 192.168.122.26
     ringnumber: 0

Info

Edit corosync.conf に書かれている通り config_version も忘れずにインクリメントします。

編集後、ファイルをコピーして完了です。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47


root@pve01:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2021-08-11 16:14:55 JST; 1 day 2h ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
 Main PID: 1219 (corosync)
    Tasks: 9 (limit: 4915)
   Memory: 146.0M
   CGroup: /system.slice/corosync.service
           └─1219 /usr/sbin/corosync -f

Aug 12 05:27:14 pve01 corosync[1219]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 12 05:50:47 pve01 corosync[1219]:   [KNET  ] link: host: 2 link: 0 is down
Aug 12 05:50:47 pve01 corosync[1219]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 12 05:50:47 pve01 corosync[1219]:   [KNET  ] host: host: 2 has no active links
Aug 12 05:50:49 pve01 corosync[1219]:   [KNET  ] rx: host: 2 link: 0 is up
Aug 12 05:50:49 pve01 corosync[1219]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 12 05:50:49 pve01 corosync[1219]:   [TOTEM ] Token has not been received in 2737 ms
Aug 12 09:34:48 pve01 corosync[1219]:   [TOTEM ] Retransmit List: 4ccde
Aug 12 14:45:03 pve01 corosync[1219]:   [TOTEM ] Token has not been received in 2737 ms
Aug 12 18:02:22 pve01 corosync[1219]:   [TOTEM ] Token has not been received in 2737 ms
root@pve01:~# cp -p corosync.conf.new /etc/pve/corosync.conf
root@pve01:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2021-08-11 16:14:55 JST; 1 day 2h ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
 Main PID: 1219 (corosync)
    Tasks: 9 (limit: 4915)
   Memory: 146.9M
   CGroup: /system.slice/corosync.service
           └─1219 /usr/sbin/corosync -f

Aug 12 05:50:47 pve01 corosync[1219]:   [KNET  ] host: host: 2 has no active links
Aug 12 05:50:49 pve01 corosync[1219]:   [KNET  ] rx: host: 2 link: 0 is up
Aug 12 05:50:49 pve01 corosync[1219]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 12 05:50:49 pve01 corosync[1219]:   [TOTEM ] Token has not been received in 2737 ms
Aug 12 09:34:48 pve01 corosync[1219]:   [TOTEM ] Retransmit List: 4ccde
Aug 12 14:45:03 pve01 corosync[1219]:   [TOTEM ] Token has not been received in 2737 ms
Aug 12 18:02:22 pve01 corosync[1219]:   [TOTEM ] Token has not been received in 2737 ms
Aug 12 18:48:41 pve01 corosync[1219]:   [CFG   ] Config reload requested by node 1
Aug 12 18:48:41 pve01 corosync[1219]:   [TOTEM ] Configuring link 0
Aug 12 18:48:41 pve01 corosync[1219]:   [TOTEM ] Configured link number 0: local addr: 192.168.122.26, port=5405

再度 pve6to7 コマンドを実行して、WARNが消えていることを確認します。

1
2
3


root@pve01:~# pve6to7 --full | grep -A1 "Checking nodelist entries"
Checking nodelist entries..
PASS: nodelist settings OK

WARN: ’noout’ flag not set - recommended to prevent rebalancing during cluster-wide upgrades.

Cephのクラスタに関しては、今回作業中の大きな変更は無いので影響なしとみなします。

WARN: storage ‘www’ enabled but not active!

CIFS周りは何故かよくマウントが切れるのですが、環境の問題なのでこの場では無視します。バックアップの保存先として指定しているもので、アップグレードはバックアップが行われない時間帯に行うので問題ないものとみなします。コマンドの結果に比してマウントが遅いだけで、一応2回目を叩くと問題なく動くというのが良くないところです。

WARN: 8 running guest(s) detected - consider migrating or stopping them.

1台ずつアップグレードする際、ライブマイグレーションによってVMを別のノードに寄せる作業をするので、今は無視します。

WARN: storage ‘www’ - parameter ‘maxfiles’ is deprecated with PVE 7.x and will be removed in a future version, use ‘prune-backups’ instead.

maxfiles というパラメータが7.xでは削除されるようなので、設定を更新します。

簡単なのは、GUIで設定変更を適用することです。 Storage -> Edit から差支えない程度の設定変更をしてOKを押します。（今回は maxfiles を1増やしました）

pve_edit_storage_config

そうすると maxfiles ではなく prune-backups が使われるようになります。
気になるようであれば、再度 maxfiles を元の数字に戻せばよいでしょう。

WARN: CIFS credentials `/etc/pve/priv/www.cred` will be moved to `/etc/pve/priv/storage/www.pw` during the update

Roadmap - Proxmox VE 7.0 を見ると

CIFS credentials have been stored in the namespaced /etc/pve/priv/storage/<storage>.pw instead of /etc/pve/<storage>.cred since Proxmox VE 6.2 - existing credentials will get moved during the upgrade allowing you to drop fallback code.

と書かれており、これに関してはアップグレードを済ませたら、指定の通りファイルが移動しているか確認するだけで十分でしょう。今は以下のように配置されています。

1
2


root@pve01:~# ls -l /etc/pve/priv/www.cred 
-rw------- 1 root www-data 17 Dec 14  2019 /etc/pve/priv/www.cred

pve6to7の対処後

最終的には以下のようになりました。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91


root@pve01:~# pve6to7 --full 
= CHECKING VERSION INFORMATION FOR PVE PACKAGES =

Checking for package updates..
PASS: all packages uptodate

Checking proxmox-ve package version..
PASS: proxmox-ve package has version >= 6.4-1

Checking running kernel version..
PASS: expected running kernel '5.4.128-1-pve'.

= CHECKING CLUSTER HEALTH/SETTINGS =

PASS: systemd unit 'pve-cluster.service' is in state 'active'
PASS: systemd unit 'corosync.service' is in state 'active'
PASS: Cluster Filesystem is quorate.

Analzying quorum settings and state..
INFO: configured votes - nodes: 3
INFO: configured votes - qdevice: 0
INFO: current expected votes: 3
INFO: current total votes: 3

Checking nodelist entries..
PASS: nodelist settings OK

Checking totem settings..
PASS: totem settings OK

INFO: run 'pvecm status' to get detailed cluster status..

= CHECKING HYPER-CONVERGED CEPH STATUS =

INFO: hyper-converged ceph setup detected!
INFO: getting Ceph status/health information..
PASS: Ceph health reported as 'HEALTH_OK'.
INFO: getting Ceph daemon versions..
PASS: single running version detected for daemon type monitor.
PASS: single running version detected for daemon type manager.
PASS: single running version detected for daemon type MDS.
PASS: single running version detected for daemon type OSD.
PASS: single running overall version detected for all Ceph daemon types.
WARN: 'noout' flag not set - recommended to prevent rebalancing during cluster-wide upgrades.
INFO: checking Ceph config..

= CHECKING CONFIGURED STORAGES =

PASS: storage 'cephfs' enabled and active.
PASS: storage 'local' enabled and active.
PASS: storage 'local-lvm' enabled and active.
PASS: storage 'rdb_ct' enabled and active.
PASS: storage 'rdb_vm' enabled and active.
PASS: storage 'www' enabled and active.

= MISCELLANEOUS CHECKS =

INFO: Checking common daemon services..
PASS: systemd unit 'pveproxy.service' is in state 'active'
PASS: systemd unit 'pvedaemon.service' is in state 'active'
PASS: systemd unit 'pvestatd.service' is in state 'active'
INFO: Checking for running guests..
WARN: 8 running guest(s) detected - consider migrating or stopping them.
INFO: Checking if the local node's hostname 'pve01' is resolvable..
INFO: Checking if resolved IP is configured on local node..
PASS: Resolved node IP '192.168.122.26' configured and active on single interface.
INFO: Checking backup retention settings..
INFO: storage 'local' - no backup retention settings defined - by default, PVE 7.x will no longer keep only the last backup, but all backups
PASS: no problems found.
INFO: checking CIFS credential location..
WARN: CIFS credentials '/etc/pve/priv/www.cred' will be moved to '/etc/pve/priv/storage/www.pw' during the update
INFO: Checking custom roles for pool permissions..
INFO: Checking node and guest description/note legnth..
PASS: All node config descriptions fit in the new limit of 64 KiB
PASS: All guest config descriptions fit in the new limit of 8 KiB
INFO: Checking container configs for deprecated lxc.cgroup entries
PASS: No legacy 'lxc.cgroup' keys found.
INFO: Checking storage content type configuration..
PASS: no problems found
INFO: Checking if the suite for the Debian security repository is correct..
INFO: Make sure to change the suite of the Debian security repository from 'buster/updates' to 'bullseye-security' - in /etc/apt/sources.list:5

= SUMMARY =

TOTAL:    32
PASSED:   29
SKIPPED:  0
WARNINGS: 3
FAILURES: 0

ATTENTION: Please check the output for detailed information!

その他の考慮事項

Linux bridgeのMACアドレスが変わるという話

変わっても特に問題ないので何もしません。以前、動的にNetwork設定を変えたくてifupdown2を入れていたこともあり Solution A: Use ifupdown2 にも該当しそうです。

Network interfaceの名前が変わるという話

Roadmap - Proxmox VE 7.0 - Known Issues

これは微妙です。
今回の構成では eno1 と eno2 で bond0 を構成していることもあり、再起動後にインターフェース名が変わるとなると厄介です。ただ、今回使用している名前が eno1 のように「オンボード組み込みのインターフェース」であるため、おそらくこれが大きく変わることは無いはず。どちらにしても、サーバーは物理アクセス可能な距離にあるので、今回は1台目をトライしてから考えることにします。

Info

結論から言うと eno1 と eno2 は特に名前の変化はなかったので、何もしなくて済みました。

cgroupv2の問題で、古いsystemdを使っているコンテナが起動しない

Roadmap - Proxmox VE 7.0 - Known Issues

具体的にはCentOS 7とか、Ubuntu 16.10とか。コンテナとしては使ってないので対象外です。

アップグレード作業開始

それでは、apt周辺のファイルを書き換えて、アップグレードを始めます。なお、あらかじめ qm migrate 1001 pve02 --online のように、アップグレード対象のノードから動作中のVMは退避させてあります。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42


root@pve01:~# cat /etc/apt/sources.list
deb http://deb.debian.org/debian buster main contrib
deb http://deb.debian.org/debian buster-updates main contrib

# security updates
deb http://security.debian.org buster/updates main contrib
root@pve01:~# sed -i 's/buster\/updates/bullseye-security/g;s/buster/bullseye/g' /etc/apt/sources.list
root@pve01:~# cat /etc/apt/sources.list
deb http://deb.debian.org/debian bullseye main contrib
deb http://deb.debian.org/debian bullseye-updates main contrib

# security updates
deb http://security.debian.org bullseye-security main contrib
root@pve01:~# cat /etc/apt/sources.list.d/pve-no-subscription.list 
deb http://download.proxmox.com/debian buster pve-no-subscription
root@pve01:~# echo 'deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription' > /etc/apt/sources.list.d/pve-no-subscription.list
root@pve01:~# cat /etc/apt/sources.list.d/pve-no-subscription.list 
deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription
root@pve01:~# cat /etc/apt/sources.list.d/ceph.list 
deb http://download.proxmox.com/debian/ceph-octopus buster main
root@pve01:~# echo "deb http://download.proxmox.com/debian/ceph-octopus bullseye main" > /etc/apt/sources.list.d/ceph.list 
root@pve01:~# cat /etc/apt/sources.list.d/ceph.list 
deb http://download.proxmox.com/debian/ceph-octopus bullseye main
root@pve01:~# apt update
Get:1 http://deb.debian.org/debian bullseye InRelease [154 kB]
Get:2 http://security.debian.org bullseye-security InRelease [44.1 kB]                   
Get:3 http://security.debian.org bullseye-security/main amd64 Packages [16.8 kB]                         
Get:4 http://security.debian.org bullseye-security/main Translation-en [8,244 B]                          
Get:5 http://deb.debian.org/debian bullseye-updates InRelease [40.1 kB]                             
Get:6 http://deb.debian.org/debian bullseye/main amd64 Packages [8,178 kB]
Get:7 http://download.proxmox.com/debian/ceph-octopus bullseye InRelease [2,891 B]                  
Get:8 http://deb.debian.org/debian bullseye/main Translation-en [6,241 kB]                          
Get:9 http://download.proxmox.com/debian/pve bullseye InRelease [3,053 B]             
Get:10 http://download.proxmox.com/debian/ceph-octopus bullseye/main amd64 Packages [13.9 kB]    
Get:11 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages [93.3 kB]
Get:12 http://deb.debian.org/debian bullseye/contrib amd64 Packages [50.4 kB]                       
Get:13 http://deb.debian.org/debian bullseye/contrib Translation-en [46.9 kB]
Fetched 14.9 MB in 5s (2,888 kB/s)                                     
Reading package lists... Done
Building dependency tree       
Reading state information... Done
792 packages can be upgraded. Run 'apt list --upgradable' to see them.

一応 / の空き領域も確認しておきます。23GBほどあれば余裕でしょう。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


root@pve01:~# df -hl
Filesystem            Size  Used Avail Use% Mounted on
udev                   16G     0   16G   0% /dev
tmpfs                 3.2G   57M  3.1G   2% /run
/dev/mapper/pve-root   30G  6.8G   23G  24% /
tmpfs                  16G   45M   16G   1% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
tmpfs                  16G     0   16G   0% /sys/fs/cgroup
/dev/sda1              97M  5.5M   92M   6% /var/lib/ceph/osd/ceph-0
/dev/fuse              30M   68K   30M   1% /etc/pve
tmpfs                 3.2G     0  3.2G   0% /run/user/0

あとは、祈りを込めてdist-upgradeを始めます。

本当にアップグレードしていいんだな？という確認がありますので、こちらも祈りを込めてEnterを叩きます
/etc/issue （ログインプロンプトに出る文章）を置き換えても良いか確認がありますが、これは問題ないので Y としています
/etc/apt/sources.list.d/pve-enterprise.list はとりあえず追加しておきますが、再起動後に消します
一部、サービスの再起動(cron, postfix)等があるかもしれませんが、こちらも問題なければOKで進めます

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


root@pve01:~# apt dist-upgrade
（中略）
Fetched 619 MB in 1min 34s (6,605 kB/s)
W: (pve-apt-hook) !! ATTENTION !!
W: (pve-apt-hook) You are attempting to upgrade from proxmox-ve '6.4-1' to proxmox-ve '7.0-2'. Please make sure to read the Upgrade notes at
W: (pve-apt-hook)       https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0
W: (pve-apt-hook) before proceeding with this operation.
W: (pve-apt-hook) 
W: (pve-apt-hook) Press enter to continue, or C^c to abort.

（中略）
Configuration file '/etc/issue'
 ==> Modified (by you or by a script) since installation.
 ==> Package distributor has shipped an updated version.
   What would you like to do about it ?  Your options are:
    Y or I  : install the package maintainer's version
    N or O  : keep your currently-installed version
      D     : show the differences between the versions
      Z     : start a shell to examine the situation
 The default action is to keep your current version.
*** issue (Y/I/N/O/D/Z) [default=N] ? Y
（中略）
Configuration file '/etc/apt/sources.list.d/pve-enterprise.list'
 ==> Deleted (by you or by a script) since installation.
 ==> Package distributor has shipped an updated version.
   What would you like to do about it ?  Your options are:
    Y or I  : install the package maintainer's version
    N or O  : keep your currently-installed version
      D     : show the differences between the versions
      Z     : start a shell to examine the situation
 The default action is to keep your current version.
*** pve-enterprise.list (Y/I/N/O/D/Z) [default=N] ? Y
（中略）

再起動して、うまく立ち上がってくるのを待ちます。

1

root@pve01:~# reboot

アップグレード作業後の確認

立ち上がってきたら、一応クラスタの確認。今回は何事もなく起動してきて、sshでログインできました。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53


root@pve01:~# pveversion 
pve-manager/7.0-11/63d82f4e (running kernel: 5.11.22-3-pve)
root@pve01:~# pvecm status
Cluster information
-------------------
Name:             pve-cluster
Config Version:   6
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Aug 12 19:55:07 2021
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.792f
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.122.26 (local)
0x00000002          1 192.168.122.27
0x00000003          1 192.168.122.28
root@pve01:~# ceph -s
  cluster:
    id:     379c9ad8-dd15-4bf1-b9fc-f5206a75fe3f
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum pve01,pve02,pve03 (age 93s)
    mgr: pve03(active, since 86s), standbys: pve01, pve02
    mds: cephfs:1 {0=pve02=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 93s), 3 in (since 3M)
 
  data:
    pools:   4 pools, 129 pgs
    objects: 158.06k objects, 615 GiB
    usage:   1.8 TiB used, 1.0 TiB / 2.8 TiB avail
    pgs:     129 active+clean
 
  io:
    client:   0 B/s rd, 721 KiB/s wr, 0 op/s rd, 24 op/s wr
 

おそらく大丈夫でしょう。

WARNに出ていた認証用のファイルも移動済みです。

1
2


root@pve01:~# ls -l /etc/pve/priv/storage/www.pw 
-rw------- 1 root www-data 17 Aug 12 19:49 /etc/pve/priv/storage/www.pw

再起動後に再度空き容量を確認。言うほど減っていませんでした。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


root@pve01:~# df -hl
Filesystem            Size  Used Avail Use% Mounted on
udev                   16G     0   16G   0% /dev
tmpfs                 3.2G  1.1M  3.2G   1% /run
/dev/mapper/pve-root   30G  7.9G   22G  27% /
tmpfs                  16G   45M   16G   1% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
/dev/fuse             128M   68K  128M   1% /etc/pve
/dev/sda1              97M  5.5M   92M   6% /var/lib/ceph/osd/ceph-0
tmpfs                 3.2G     0  3.2G   0% /run/user/0

そして、アップグレードした時に配置された pve-enterprise.list は不要なのでコメントアウトしておきます。

1

root@pve01:~# sed -i 's/deb /#deb /g' /etc/apt/sources.list.d/pve-enterprise.list

古いProxmox VEから新しいProxmox VEへのライブマイグレーションは成功するはず(*)なので、作業したノードより新しいバージョンのノードにライブマイグレーションして、古いノードも同じ手順でアップグレードしていきます。

Info

そうはいうけどQEMUが新しくなってデフォルト値が変わったりすると失敗することもあるので、過信は禁物。ただ、今回は全部成功した。

なお、2台目以降では作業開始前に pve6to7 コマンドを叩くと、異なるバージョンのCephデーモンが動いているとWARNが出ます。確認したところ、バージョンとしては同じで buster と bullseye でバイナリハッシュが異なることに起因するものなので、これも無視して良いでしょう。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


root@pve01:~# ceph versions
{
    "mon": {
        "ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable)": 2,
        "ceph version 15.2.13 (de5fc19f874b2757d3c0977de8b143f6146af132) octopus (stable)": 1
    },
    "mgr": {
        "ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable)": 2,
        "ceph version 15.2.13 (de5fc19f874b2757d3c0977de8b143f6146af132) octopus (stable)": 1
    },
    "osd": {
        "ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable)": 2,
        "ceph version 15.2.13 (de5fc19f874b2757d3c0977de8b143f6146af132) octopus (stable)": 1
    },
    "mds": {
        "ceph version 15.2.13 (de5fc19f874b2757d3c0977de8b143f6146af132) octopus (stable)": 1
    },
    "overall": {
        "ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable)": 6,
        "ceph version 15.2.13 (de5fc19f874b2757d3c0977de8b143f6146af132) octopus (stable)": 4
    }
}

アップグレード完了

3台ともアップグレードが終わり、クラスタも問題なさそうなので、これにて完了となります。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71


root@pve01:~# pveversion 
pve-manager/7.0-11/63d82f4e (running kernel: 5.11.22-3-pve)
root@pve01:~# pvecm status
Cluster information
-------------------
Name:             pve-cluster
Config Version:   6
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Aug 12 21:00:58 2021
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.7953
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.122.26 (local)
0x00000002          1 192.168.122.27
0x00000003          1 192.168.122.28
root@pve01:~# ceph -s
  cluster:
    id:     379c9ad8-dd15-4bf1-b9fc-f5206a75fe3f
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum pve01,pve02,pve03 (age 6m)
    mgr: pve01(active, since 7m), standbys: pve02, pve03
    mds: cephfs:1 {0=pve01=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 6m), 3 in (since 3M)
 
  data:
    pools:   4 pools, 129 pgs
    objects: 158.06k objects, 615 GiB
    usage:   1.8 TiB used, 1.0 TiB / 2.8 TiB avail
    pgs:     129 active+clean
 
  io:
    client:   2.7 KiB/s rd, 1.0 MiB/s wr, 0 op/s rd, 35 op/s wr
 
root@pve01:~# ceph versions
{
    "mon": {
        "ceph version 15.2.13 (de5fc19f874b2757d3c0977de8b143f6146af132) octopus (stable)": 3
    },
    "mgr": {
        "ceph version 15.2.13 (de5fc19f874b2757d3c0977de8b143f6146af132) octopus (stable)": 3
    },
    "osd": {
        "ceph version 15.2.13 (de5fc19f874b2757d3c0977de8b143f6146af132) octopus (stable)": 3
    },
    "mds": {
        "ceph version 15.2.13 (de5fc19f874b2757d3c0977de8b143f6146af132) octopus (stable)": 3
    },
    "overall": {
        "ceph version 15.2.13 (de5fc19f874b2757d3c0977de8b143f6146af132) octopus (stable)": 12
    }
}

終わり

Proxmox VE 5.x時代に作ったクラスタですが、今年はEoLにかなり余裕をもって7.xにアップグレードできました。
Proxmox VEは本当に良くできたアプライアンスだと思います。
えー、さて、次はCeph Octopus to Pacificをやらないといけないのかな？うへぇ…

Proxmox VE Hyper-Converged Ceph Cluster upgrade 6.4 to 7.0

作業構成

pve6to7 コマンドによる事前確認

pve6to7 の結果確認

WARN: pve01: ring0_addr ‘pve01’ resolves to ‘192.168.122.26’.

corosync.conf の修正

WARN: ’noout’ flag not set - recommended to prevent rebalancing during cluster-wide upgrades.

WARN: storage ‘www’ enabled but not active!

WARN: 8 running guest(s) detected - consider migrating or stopping them.

WARN: storage ‘www’ - parameter ‘maxfiles’ is deprecated with PVE 7.x and will be removed in a future version, use ‘prune-backups’ instead.

WARN: CIFS credentials /etc/pve/priv/www.cred will be moved to /etc/pve/priv/storage/www.pw during the update

pve6to7の対処後

その他の考慮事項

Linux bridgeのMACアドレスが変わるという話

Network interfaceの名前が変わるという話

cgroupv2の問題で、古いsystemdを使っているコンテナが起動しない

アップグレード作業開始

アップグレード作業後の確認

アップグレード完了

終わり

WARN: CIFS credentials `/etc/pve/priv/www.cred` will be moved to `/etc/pve/priv/storage/www.pw` during the update