API-initiated Eviction

API-initiated eviction is the process by which you use the Eviction API to create an Eviction object that triggers graceful pod termination.

You can request eviction by calling the Eviction API directly, or programmatically using a client of the API server, like the kubectl drain command. This creates an Eviction object, which causes the API server to terminate the Pod.

API-initiated evictions respect your configured PodDisruptionBudgets and terminationGracePeriodSeconds.

Using the API to create an Eviction object for a Pod is like performing a policy-controlled DELETE operation on the Pod.

Calling the Eviction API

You can use a Kubernetes language client to access the Kubernetes API and create an Eviction object. To do this, you POST the attempted operation, similar to the following example:

{
  "apiVersion": "policy/v1",
  "kind": "Eviction",
  "metadata": {
    "name": "quux",
    "namespace": "default"
  }
}

{
  "apiVersion": "policy/v1beta1",
  "kind": "Eviction",
  "metadata": {
    "name": "quux",
    "namespace": "default"
  }
}

Alternatively, you can attempt an eviction operation by accessing the API using curl or wget, similar to the following example:

curl -v -H 'Content-type: application/json' https://your-cluster-api-endpoint.example/api/v1/namespaces/default/pods/quux/eviction -d @eviction.json

How API-initiated eviction works

When you request an eviction using the API, the API server performs admission checks and responds in one of the following ways:

  • 200 OK: the eviction is allowed, the Eviction subresource is created, and the Pod is deleted, similar to sending a DELETE request to the Pod URL.
  • 429 Too Many Requests: the eviction is not currently allowed because of the configured PodDisruptionBudget. You may be able to attempt the eviction again later. You might also see this response because of API rate limiting.
  • 500 Internal Server Error: the eviction is not allowed because there is a misconfiguration, like if multiple PodDisruptionBudgets reference the same Pod.

If the Pod you want to evict isn't part of a workload that has a PodDisruptionBudget, the API server always returns 200 OK and allows the eviction.

If the API server allows the eviction, the Pod is deleted as follows:

  1. The Pod resource in the API server is updated with a deletion timestamp, after which the API server considers the Pod resource to be terminated. The Pod resource is also marked with the configured grace period.
  2. The kubelet on the node where the local Pod is running notices that the Pod resource is marked for termination and starts to gracefully shut down the local Pod.
  3. While the kubelet is shutting the Pod down, the control plane removes the Pod from Endpoint and EndpointSlice objects. As a result, controllers no longer consider the Pod as a valid object.
  4. After the grace period for the Pod expires, the kubelet forcefully terminates the local Pod.
  5. The kubelet tells the API server to remove the Pod resource.
  6. The API server deletes the Pod resource.

Troubleshooting stuck evictions

In some cases, your applications may enter a broken state, where the Eviction API will only return 429 or 500 responses until you intervene. This can happen if, for example, a ReplicaSet creates pods for your application but new pods do not enter a Ready state. You may also notice this behavior in cases where the last evicted Pod had a long termination grace period.

If you notice stuck evictions, try one of the following solutions:

  • Abort or pause the automated operation causing the issue. Investigate the stuck application before you restart the operation.
  • Wait a while, then directly delete the Pod from your cluster control plane instead of using the Eviction API.

What's next

Last modified October 25, 2022 at 3:58 PM PST: Add page weights to concepts -> scheduling-eviction pages (66df1d729)