Kubernetes / OpenShift Error „layer not known“

Today there was again a not immediately understandable error in the Kubernetes cluster. I have an OKD (OpenShift) cluster 4.5 with 3 masters and 2 workers running here.

After a number of nodes became unavailable, I rebooted the entire cluster once. Unfortunately, the initiated reboot took a long time for some nodes. Therefore, I restarted individual nodes via hard reset without further ado.

After all nodes were restarted and in „Ready“ status, one of the nodes could not start pods

So it was off to troubleshoot on the node and in the cluster. First, I look at which pods cannot be started:

oc get pod --all-namespaces -o wide

And see that all cannot be started on the same host. So look a little closer at the pod:

oc describe pod -n <namespacevompod> <podname>

And see here the following error:

Failed to create pod sandbox: rpc error: code = Unknown desc = error creating pod sandbox with name "XXXX": layer not known

That doesn’t get me much further now. So I go via SSH to the node. Here I check if all services are running

systemctl

And see that the service „nodeip-configuration.service“ is failed. So take a closer look here:

systemctl status nodeip-configuration.service

And here the following error message appears:

podman[11387]: Error: error creating container storage: layer not known

Now it is at least clearer which „layer“ is meant here. So it should be a storage layer. So I asked Google again and found an entry:

https://bugzilla.redhat.com/show_bug.cgi?id=1857224

Ok. Apparently the node took the hard reset badly and everything is no longer clean under „/var/lib/containers“. So I follow the recommendation:

systemctl stop kubelet
systemctl stop crio
rm -rf /var/lib/containers/
systemctl start crio
systemctl start kubelet

And voila, the pods on the node can start again!

Teilen mit:

Ähnliche Beiträge

Hinterlasse einen Kommentar Antwort abbrechen