タイトルの通りですが、AKS on Azure Stack HCI の k8s を v1.21.2 で展開した後に、コントロールプレーンノードを再起動した後に k8s が起動しないという事象が発生しました。
「systemctl status kubelet」で状態を確認してみると、次のようなエラーが発生し、kubelet が起動していませんでした。
Nov 30 11:35:13 moc-lrr9qh1ew26 kubelet[741]: E1130 11:35:13.936688 741 kuberuntime_sandbox.go:68] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox image \"ecpacr.azurecr.io/pause:3.2\": failed to pull image \"ec pacr.azurecr.io/pause:3.2\": failed to pull and unpack image \"ecpacr.azurecr.io/pause:3.2\": failed to resolve reference \"ecpacr.azurecr.io/pause:3.2\": failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized" pod="kube-system /etcd-moc-lrr9qh1ew26" Nov 30 11:35:13 moc-lrr9qh1ew26 kubelet[741]: E1130 11:35:13.936713 741 kuberuntime_manager.go:790] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"ecpacr.azurecr.io/pause:3.2\": failed to pull image \"ec pacr.azurecr.io/pause:3.2\": failed to pull and unpack image \"ecpacr.azurecr.io/pause:3.2\": failed to resolve reference \"ecpacr.azurecr.io/pause:3.2\": failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized" pod="kube-system /etcd-moc-lrr9qh1ew26" Nov 30 11:35:13 moc-lrr9qh1ew26 kubelet[741]: E1130 11:35:13.936764 741 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"etcd-moc-lrr9qh1ew26_kube-system(8e85a0583a698c64f4daca0d375decfa)\" with CreatePodSandboxErro r: \"Failed to create sandbox for pod \\\"etcd-moc-lrr9qh1ew26_kube-system(8e85a0583a698c64f4daca0d375decfa)\\\": rpc error: code = Unknown desc = failed to get sandbox image \\\"ecpacr.azurecr.io/pause:3.2\\\": failed to pull image \\\"ecpacr.azurecr.io/pause:3 .2\\\": failed to pull and unpack image \\\"ecpacr.azurecr.io/pause:3.2\\\": failed to resolve reference \\\"ecpacr.azurecr.io/pause:3.2\\\": failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized\"" pod="kube-system/etcd-moc-l rr9qh1ew26" podUID=8e85a0583a698c64f4daca0d375decfa Nov 30 11:35:13 moc-lrr9qh1ew26 kubelet[741]: E1130 11:35:13.981869 741 kubelet.go:2291] "Error getting node" err="node \"moc-lrr9qh1ew26\" not found" Nov 30 11:35:14 moc-lrr9qh1ew26 kubelet[741]: E1130 11:35:14.035070 741 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"moc-lrr9qh1ew26.16bc4fcc087e2ba6", GenerateName:"", Namespace :"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"moc-lrr9qh1ew26", UID:"moc-lrr 9qh1ew26", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"moc-lrr9qh1ew26"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc061a0f98b59afa6, ext:11623948584, loc:( *time.Location)(0x74be600)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc061a0f98b59afa6, ext:11623948584, loc:(*time.Location)(0x74be600)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v 1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://10.5.0.2:6443/api/v1/namespaces/default/events": EOF'(may retry after sleeping) Nov 30 11:35:14 moc-lrr9qh1ew26 kubelet[741]: E1130 11:35:14.082278 741 kubelet.go:2291] "Error getting node" err="node \"moc-lrr9qh1ew26\" not found" Nov 30 11:35:14 moc-lrr9qh1ew26 kubelet[741]: I1130 11:35:14.104104 741 trace.go:205] Trace[2056973426]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:134 (30-Nov-2021 11:35:02.280) (total time: 11824ms): Nov 30 11:35:14 moc-lrr9qh1ew26 kubelet[741]: Trace[2056973426]: [11.824032031s] [11.824032031s] END Nov 30 11:35:14 moc-lrr9qh1ew26 kubelet[741]: E1130 11:35:14.104137 741 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: an error on the server ("") has prevented the request from succeeding (get runtimeclasses.node.k8s.io) Nov 30 11:35:14 moc-lrr9qh1ew26 kubelet[741]: E1130 11:35:14.182788 741 kubelet.go:2291] "Error getting node" err="node \"moc-lrr9qh1ew26\" not found"
kubelet の起動のパラメーターを見ると、「–pod-infra-container-image=ecpacr.azurecr.io/pause:3.2」が指定されており、上記のメッセージでも pause コンテナーのイメージのアクセスでエラーが発生しているようでした。