Kubernetes / OpenShift Error „layer not known“

Today there was again a not immediately understandable error in the Kubernetes cluster. I have an OKD (OpenShift) cluster 4.5 with 3 masters and 2 workers running here.

After a number of nodes became unavailable, I rebooted the entire cluster once. Unfortunately, the initiated reboot took a long time for some nodes. Therefore, I restarted individual nodes via hard reset without further ado.

After all nodes were restarted and in „Ready“ status, one of the nodes could not start pods :-/

So it was off to troubleshoot on the node and in the cluster. First, I look at which pods cannot be started:

oc get pod --all-namespaces -o wide

And see that all cannot be started on the same host. So look a little closer at the pod:

oc describe pod -n <namespacevompod> <podname>

And see here the following error:

Failed to create pod sandbox: rpc error: code = Unknown desc = error creating pod sandbox with name "XXXX": layer not known

That doesn’t get me much further now. So I go via SSH to the node. Here I check if all services are running

systemctl

And see that the service „nodeip-configuration.service“ is failed. So take a closer look here:

systemctl status nodeip-configuration.service

And here the following error message appears:

podman[11387]: Error: error creating container storage: layer not known

Now it is at least clearer which „layer“ is meant here. So it should be a storage layer. So I asked Google again and found an entry:

https://bugzilla.redhat.com/show_bug.cgi?id=1857224

Ok. Apparently the node took the hard reset badly and everything is no longer clean under „/var/lib/containers“. So I follow the recommendation:

systemctl stop kubelet
systemctl stop crio
rm -rf /var/lib/containers/
systemctl start crio
systemctl start kubelet

And voila, the pods on the node can start again!

Hinterlasse einen Kommentar