1 - Kubernetes Component SLI Metrics
Kubernetes v1.26 [alpha]
As an alpha feature, Kubernetes lets you configure Service Level Indicator (SLI) metrics
for each Kubernetes component binary. This metric endpoint is exposed on the serving
HTTPS port of each component, at the path /metrics/slis
. You must enable the
ComponentSLIs
feature gate
for every component from which you want to scrape SLI metrics.
SLI Metrics
With SLI metrics enabled, each Kubernetes component exposes two metrics, labeled per healthcheck:
- a gauge (which represents the current state of the healthcheck)
- a counter (which records the cumulative counts observed for each healthcheck state)
You can use the metric information to calculate per-component availability statistics. For example, the API server checks the health of etcd. You can work out and report how available or unavailable etcd has been - as reported by its client, the API server.
The prometheus gauge data looks like this:
# HELP kubernetes_healthcheck [ALPHA] This metric records the result of a single healthcheck.
# TYPE kubernetes_healthcheck gauge
kubernetes_healthcheck{name="autoregister-completion",type="healthz"} 1
kubernetes_healthcheck{name="autoregister-completion",type="readyz"} 1
kubernetes_healthcheck{name="etcd",type="healthz"} 1
kubernetes_healthcheck{name="etcd",type="readyz"} 1
kubernetes_healthcheck{name="etcd-readiness",type="readyz"} 1
kubernetes_healthcheck{name="informer-sync",type="readyz"} 1
kubernetes_healthcheck{name="log",type="healthz"} 1
kubernetes_healthcheck{name="log",type="readyz"} 1
kubernetes_healthcheck{name="ping",type="healthz"} 1
kubernetes_healthcheck{name="ping",type="readyz"} 1
While the counter data looks like this:
# HELP kubernetes_healthchecks_total [ALPHA] This metric records the results of all healthcheck.
# TYPE kubernetes_healthchecks_total counter
kubernetes_healthchecks_total{name="autoregister-completion",status="error",type="readyz"} 1
kubernetes_healthchecks_total{name="autoregister-completion",status="success",type="healthz"} 15
kubernetes_healthchecks_total{name="autoregister-completion",status="success",type="readyz"} 14
kubernetes_healthchecks_total{name="etcd",status="success",type="healthz"} 15
kubernetes_healthchecks_total{name="etcd",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="etcd-readiness",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="informer-sync",status="error",type="readyz"} 1
kubernetes_healthchecks_total{name="informer-sync",status="success",type="readyz"} 14
kubernetes_healthchecks_total{name="log",status="success",type="healthz"} 15
kubernetes_healthchecks_total{name="log",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="ping",status="success",type="healthz"} 15
kubernetes_healthchecks_total{name="ping",status="success",type="readyz"} 15
Using this data
The component SLIs metrics endpoint is intended to be scraped at a high frequency. Scraping
at a high frequency means that you end up with greater granularity of the gauge's signal, which
can be then used to calculate SLOs. The /metrics/slis
endpoint provides the raw data necessary
to calculate an availability SLO for the respective Kubernetes component.
2 - Node metrics data
The kubelet gathers metric statistics at the node, volume, pod and container level, and emits this information in the Summary API.
You can send a proxied request to the stats summary API via the Kubernetes API server.
Here is an example of a Summary API request for a node named minikube
:
kubectl get --raw "/api/v1/nodes/minikube/proxy/stats/summary"
Here is the same API call using curl
:
# You need to run "kubectl proxy" first
# Change 8080 to the port that "kubectl proxy" assigns
curl http://localhost:8080/api/v1/nodes/minikube/proxy/stats/summary
metrics-server
0.6.x, metrics-server
queries the /metrics/resource
kubelet endpoint, and not /stats/summary
.
Summary metrics API source
By default, Kubernetes fetches node summary metrics data using an embedded cAdvisor that runs within the kubelet.
Summary API data via CRI
Kubernetes v1.23 [alpha]
If you enable the PodAndContainerStatsFromCRI
feature gate in your
cluster, and you use a container runtime that supports statistics access via
Container Runtime Interface (CRI), then
the kubelet fetches Pod- and container-level metric data using CRI, and not via cAdvisor.
What's next
The task pages for Troubleshooting Clusters discuss how to use a metrics pipeline that rely on these data.
3 - Kubernetes Metrics Reference
Metrics (auto-generated 2022 Nov 01)
This page details the metrics that different Kubernetes components export. You can query the metrics endpoint for these components using an HTTP scrape, and fetch the current metrics data in Prometheus format.
List of Stable Kubernetes Metrics
Name | Stability Level | Type | Help | Labels | Const Labels |
---|---|---|---|---|---|
apiserver_admission_controller_admission_duration_seconds | STABLE | Histogram | Admission controller latency histogram in seconds, identified by name and broken out for each operation and API resource and type (validate or admit). | name operation rejected type |
None |
apiserver_admission_step_admission_duration_seconds | STABLE | Histogram | Admission sub-step latency histogram in seconds, broken out for each operation and API resource and step type (validate or admit). | operation rejected type |
None |
apiserver_admission_webhook_admission_duration_seconds | STABLE | Histogram | Admission webhook latency histogram in seconds, identified by name and broken out for each operation and API resource and type (validate or admit). | name operation rejected type |
None |
apiserver_current_inflight_requests | STABLE | Gauge | Maximal number of currently used inflight request limit of this apiserver per request kind in last second. | request_kind |
None |
apiserver_longrunning_requests | STABLE | Gauge | Gauge of all active long-running apiserver requests broken out by verb, group, version, resource, scope and component. Not all requests are tracked this way. | component group resource scope subresource verb version |
None |
apiserver_request_duration_seconds | STABLE | Histogram | Response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope and component. | component dry_run group resource scope subresource verb version |
None |
apiserver_request_total | STABLE | Counter | Counter of apiserver requests broken out for each verb, dry run value, group, version, resource, scope, component, and HTTP response code. | code component dry_run group resource scope subresource verb version |
None |
apiserver_requested_deprecated_apis | STABLE | Gauge | Gauge of deprecated APIs that have been requested, broken out by API group, version, resource, subresource, and removed_release. | group removed_release resource subresource version |
None |
apiserver_response_sizes | STABLE | Histogram | Response size distribution in bytes for each group, version, verb, resource, subresource, scope and component. | component group resource scope subresource verb version |
None |
apiserver_storage_objects | STABLE | Gauge | Number of stored objects at the time of last check split by kind. | resource |
None |
node_collector_evictions_total | STABLE | Counter | Number of Node evictions that happened since current instance of NodeController started. | zone |
None |
scheduler_framework_extension_point_duration_seconds | STABLE | Histogram | Latency for running all plugins of a specific extension point. | extension_point profile status |
None |
scheduler_pending_pods | STABLE | Gauge | Number of pending pods, by the queue type. 'active' means number of pods in activeQ; 'backoff' means number of pods in backoffQ; 'unschedulable' means number of pods in unschedulablePods. | queue |
None |
scheduler_pod_scheduling_attempts | STABLE | Histogram | Number of attempts to successfully schedule a pod. | None | None |
scheduler_pod_scheduling_duration_seconds | STABLE | Histogram | E2e latency for a pod being scheduled which may include multiple scheduling attempts. | attempts |
None |
scheduler_preemption_attempts_total | STABLE | Counter | Total preemption attempts in the cluster till now | None | None |
scheduler_preemption_victims | STABLE | Histogram | Number of selected preemption victims | None | None |
scheduler_queue_incoming_pods_total | STABLE | Counter | Number of pods added to scheduling queues by event and queue type. | event queue |
None |
scheduler_schedule_attempts_total | STABLE | Counter | Number of attempts to schedule pods, by the result. 'unschedulable' means a pod could not be scheduled, while 'error' means an internal scheduler problem. | profile result |
None |
scheduler_scheduling_attempt_duration_seconds | STABLE | Histogram | Scheduling attempt latency in seconds (scheduling algorithm + binding) | profile result |
None |
List of Alpha Kubernetes Metrics
Name | Stability Level | Type | Help | Labels | Const Labels |
---|---|---|---|---|---|
aggregator_openapi_v2_regeneration_count | ALPHA | Counter | Counter of OpenAPI v2 spec regeneration count broken down by causing APIService name and reason. | apiservice reason |
None |
aggregator_openapi_v2_regeneration_duration | ALPHA | Gauge | Gauge of OpenAPI v2 spec regeneration duration in seconds. | reason |
None |
aggregator_unavailable_apiservice | ALPHA | Custom | Gauge of APIServices which are marked as unavailable broken down by APIService name. | name |
None |
aggregator_unavailable_apiservice_total | ALPHA | Counter | Counter of APIServices which are marked as unavailable broken down by APIService name and reason. | name reason |
None |
apiextensions_openapi_v2_regeneration_count | ALPHA | Counter | Counter of OpenAPI v2 spec regeneration count broken down by causing CRD name and reason. | crd reason |
None |
apiextensions_openapi_v3_regeneration_count | ALPHA | Counter | Counter of OpenAPI v3 spec regeneration count broken down by group, version, causing CRD and reason. | crd group reason version |
None |
apiserver_admission_step_admission_duration_seconds_summary | ALPHA | Summary | Admission sub-step latency summary in seconds, broken out for each operation and API resource and step type (validate or admit). | operation rejected type |
None |
apiserver_admission_webhook_fail_open_count | ALPHA | Counter | Admission webhook fail open count, identified by name and broken out for each admission type (validating or mutating). | name type |
None |
apiserver_admission_webhook_rejection_count | ALPHA | Counter | Admission webhook rejection count, identified by name and broken out for each admission type (validating or admit) and operation. Additional labels specify an error type (calling_webhook_error or apiserver_internal_error if an error occurred; no_error otherwise) and optionally a non-zero rejection code if the webhook rejects the request with an HTTP status code (honored by the apiserver when the code is greater or equal to 400). Codes greater than 600 are truncated to 600, to keep the metrics cardinality bounded. | error_type name operation rejection_code type |
None |
apiserver_admission_webhook_request_total | ALPHA | Counter | Admission webhook request total, identified by name and broken out for each admission type (validating or mutating) and operation. Additional labels specify whether the request was rejected or not and an HTTP status code. Codes greater than 600 are truncated to 600, to keep the metrics cardinality bounded. | code name operation rejected type |
None |
apiserver_audit_error_total | ALPHA | Counter | Counter of audit events that failed to be audited properly. Plugin identifies the plugin affected by the error. | plugin |
None |
apiserver_audit_event_total | ALPHA | Counter | Counter of audit events generated and sent to the audit backend. | None | None |
apiserver_audit_level_total | ALPHA | Counter | Counter of policy levels for audit events (1 per request). | level |
None |
apiserver_audit_requests_rejected_total | ALPHA | Counter | Counter of apiserver requests rejected due to an error in audit logging backend. | None | None |
apiserver_cache_list_fetched_objects_total | ALPHA | Counter | Number of objects read from watch cache in the course of serving a LIST request | index resource_prefix |
None |
apiserver_cache_list_returned_objects_total | ALPHA | Counter | Number of objects returned for a LIST request from watch cache | resource_prefix |
None |
apiserver_cache_list_total | ALPHA | Counter | Number of LIST requests served from watch cache | index resource_prefix |
None |
apiserver_cel_compilation_duration_seconds | ALPHA | Histogram | None | None | |
apiserver_cel_evaluation_duration_seconds | ALPHA | Histogram | None | None | |
apiserver_certificates_registry_csr_honored_duration_total | ALPHA | Counter | Total number of issued CSRs with a requested duration that was honored, sliced by signer (only kubernetes.io signer names are specifically identified) | signerName |
None |
apiserver_certificates_registry_csr_requested_duration_total | ALPHA | Counter | Total number of issued CSRs with a requested duration, sliced by signer (only kubernetes.io signer names are specifically identified) | signerName |
None |
apiserver_client_certificate_expiration_seconds | ALPHA | Histogram | Distribution of the remaining lifetime on the certificate used to authenticate a request. | None | None |
apiserver_crd_webhook_conversion_duration_seconds | ALPHA | Histogram | CRD webhook conversion duration in seconds | crd_name from_version succeeded to_version |
None |
apiserver_current_inqueue_requests | ALPHA | Gauge | Maximal number of queued requests in this apiserver per request kind in last second. | request_kind |
None |
apiserver_delegated_authn_request_duration_seconds | ALPHA | Histogram | Request latency in seconds. Broken down by status code. | code |
None |
apiserver_delegated_authn_request_total | ALPHA | Counter | Number of HTTP requests partitioned by status code. | code |
None |
apiserver_delegated_authz_request_duration_seconds | ALPHA | Histogram | Request latency in seconds. Broken down by status code. | code |
None |
apiserver_delegated_authz_request_total | ALPHA | Counter | Number of HTTP requests partitioned by status code. | code |
None |
apiserver_egress_dialer_dial_duration_seconds | ALPHA | Histogram | Dial latency histogram in seconds, labeled by the protocol (http-connect or grpc), transport (tcp or uds) | protocol transport |
None |
apiserver_egress_dialer_dial_failure_count | ALPHA | Counter | Dial failure count, labeled by the protocol (http-connect or grpc), transport (tcp or uds), and stage (connect or proxy). The stage indicates at which stage the dial failed | protocol stage transport |
None |
apiserver_envelope_encryption_dek_cache_fill_percent | ALPHA | Gauge | Percent of the cache slots currently occupied by cached DEKs. | None | None |
apiserver_envelope_encryption_dek_cache_inter_arrival_time_seconds | ALPHA | Histogram | Time (in seconds) of inter arrival of transformation requests. | transformation_type |
None |
apiserver_flowcontrol_current_executing_requests | ALPHA | Gauge | Number of requests in initial (for a WATCH) or any (for a non-WATCH) execution stage in the API Priority and Fairness subsystem | flow_schema priority_level |
None |
apiserver_flowcontrol_current_inqueue_requests | ALPHA | Gauge | Number of requests currently pending in queues of the API Priority and Fairness subsystem | flow_schema priority_level |
None |
apiserver_flowcontrol_current_r | ALPHA | Gauge | R(time of last change) | priority_level |
None |
apiserver_flowcontrol_dispatch_r | ALPHA | Gauge | R(time of last dispatch) | priority_level |
None |
apiserver_flowcontrol_dispatched_requests_total | ALPHA | Counter | Number of requests executed by API Priority and Fairness subsystem | flow_schema priority_level |
None |
apiserver_flowcontrol_epoch_advance_total | ALPHA | Counter | Number of times the queueset's progress meter jumped backward | priority_level success |
None |
apiserver_flowcontrol_latest_s | ALPHA | Gauge | S(most recently dispatched request) | priority_level |
None |
apiserver_flowcontrol_next_discounted_s_bounds | ALPHA | Gauge | min and max, over queues, of S(oldest waiting request in queue) - estimated work in progress | bound priority_level |
None |
apiserver_flowcontrol_next_s_bounds | ALPHA | Gauge | min and max, over queues, of S(oldest waiting request in queue) | bound priority_level |
None |
apiserver_flowcontrol_priority_level_request_utilization | ALPHA | TimingRatioHistogram | Observations, at the end of every nanosecond, of number of requests (as a fraction of the relevant limit) waiting or in any stage of execution (but only initial stage for WATCHes) | phase priority_level |
None |
apiserver_flowcontrol_priority_level_seat_utilization | ALPHA | TimingRatioHistogram | Observations, at the end of every nanosecond, of utilization of seats for any stage of execution (but only initial stage for WATCHes) | priority_level |
map[phase:executing] |
apiserver_flowcontrol_read_vs_write_current_requests | ALPHA | TimingRatioHistogram | Observations, at the end of every nanosecond, of the number of requests (as a fraction of the relevant limit) waiting or in regular stage of execution | phase request_kind |
None |
apiserver_flowcontrol_rejected_requests_total | ALPHA | Counter | Number of requests rejected by API Priority and Fairness subsystem | flow_schema priority_level reason |
None |
apiserver_flowcontrol_request_concurrency_in_use | ALPHA | Gauge | Concurrency (number of seats) occupied by the currently executing (initial stage for a WATCH, any stage otherwise) requests in the API Priority and Fairness subsystem | flow_schema priority_level |
None |
apiserver_flowcontrol_request_concurrency_limit | ALPHA | Gauge | Shared concurrency limit in the API Priority and Fairness subsystem | priority_level |
None |
apiserver_flowcontrol_request_dispatch_no_accommodation_total | ALPHA | Counter | Number of times a dispatch attempt resulted in a non accommodation due to lack of available seats | flow_schema priority_level |
None |
apiserver_flowcontrol_request_execution_seconds | ALPHA | Histogram | Duration of initial stage (for a WATCH) or any (for a non-WATCH) stage of request execution in the API Priority and Fairness subsystem | flow_schema priority_level type |
None |
apiserver_flowcontrol_request_queue_length_after_enqueue | ALPHA | Histogram | Length of queue in the API Priority and Fairness subsystem, as seen by each request after it is enqueued | flow_schema priority_level |
None |
apiserver_flowcontrol_request_wait_duration_seconds | ALPHA | Histogram | Length of time a request spent waiting in its queue | execute flow_schema priority_level |
None |
apiserver_flowcontrol_watch_count_samples | ALPHA | Histogram | count of watchers for mutating requests in API Priority and Fairness | flow_schema priority_level |
None |
apiserver_flowcontrol_work_estimated_seats | ALPHA | Histogram | Number of estimated seats (maximum of initial and final seats) associated with requests in API Priority and Fairness | flow_schema priority_level |
None |
apiserver_init_events_total | ALPHA | Counter | Counter of init events processed in watch cache broken by resource type. | resource |
None |
apiserver_kube_aggregator_x509_insecure_sha1_total | ALPHA | Counter | Counts the number of requests to servers with insecure SHA1 signatures in their serving certificate OR the number of connection failures due to the insecure SHA1 signatures (either/or, based on the runtime environment) | None | None |
apiserver_kube_aggregator_x509_missing_san_total | ALPHA | Counter | Counts the number of requests to servers missing SAN extension in their serving certificate OR the number of connection failures due to the lack of x509 certificate SAN extension missing (either/or, based on the runtime environment) | None | None |
apiserver_request_aborts_total | ALPHA | Counter | Number of requests which apiserver aborted possibly due to a timeout, for each group, version, verb, resource, subresource and scope | group resource scope subresource verb version |
None |
apiserver_request_body_sizes | ALPHA | Histogram | Apiserver request body sizes broken out by size. | resource verb |
None |
apiserver_request_filter_duration_seconds | ALPHA | Histogram | Request filter latency distribution in seconds, for each filter type | filter |
None |
apiserver_request_post_timeout_total | ALPHA | Counter | Tracks the activity of the request handlers after the associated requests have been timed out by the apiserver | source status |
None |
apiserver_request_slo_duration_seconds | ALPHA | Histogram | Response latency distribution (not counting webhook duration) in seconds for each verb, group, version, resource, subresource, scope and component. | component group resource scope subresource verb version |
None |
apiserver_request_terminations_total | ALPHA | Counter | Number of requests which apiserver terminated in self-defense. | code component group resource scope subresource verb version |
None |
apiserver_request_timestamp_comparison_time | ALPHA | Histogram | Time taken for comparison of old vs new objects in UPDATE or PATCH requests | code_path |
None |
apiserver_selfrequest_total | ALPHA | Counter | Counter of apiserver self-requests broken out for each verb, API resource and subresource. | resource subresource verb |
None |
apiserver_storage_data_key_generation_duration_seconds | ALPHA | Histogram | Latencies in seconds of data encryption key(DEK) generation operations. | None | None |
apiserver_storage_data_key_generation_failures_total | ALPHA | Counter | Total number of failed data encryption key(DEK) generation operations. | None | None |
apiserver_storage_db_total_size_in_bytes | ALPHA | Gauge | Total size of the storage database file physically allocated in bytes. | endpoint |
None |
apiserver_storage_envelope_transformation_cache_misses_total | ALPHA | Counter | Total number of cache misses while accessing key decryption key(KEK). | None | None |
apiserver_storage_list_evaluated_objects_total | ALPHA | Counter | Number of objects tested in the course of serving a LIST request from storage | resource |
None |
apiserver_storage_list_fetched_objects_total | ALPHA | Counter | Number of objects read from storage in the course of serving a LIST request | resource |
None |
apiserver_storage_list_returned_objects_total | ALPHA | Counter | Number of objects returned for a LIST request from storage | resource |
None |
apiserver_storage_list_total | ALPHA | Counter | Number of LIST requests served from storage | resource |
None |
apiserver_storage_transformation_duration_seconds | ALPHA | Histogram | Latencies in seconds of value transformation operations. | transformation_type |
None |
apiserver_storage_transformation_operations_total | ALPHA | Counter | Total number of transformations. | status transformation_type transformer_prefix |
None |
apiserver_terminated_watchers_total | ALPHA | Counter | Counter of watchers closed due to unresponsiveness broken by resource type. | resource |
None |
apiserver_tls_handshake_errors_total | ALPHA | Counter | Number of requests dropped with 'TLS handshake error from' error | None | None |
apiserver_validating_admission_policy_check_duration_seconds | ALPHA | Histogram | Validation admission latency for individual validation expressions in seconds, labeled by policy and param resource, further including binding, state and enforcement action taken. | enforcement_action params policy policy_binding state validation_expression |
None |
apiserver_validating_admission_policy_check_total | ALPHA | Counter | Validation admission policy check total, labeled by policy and param resource, and further identified by binding, validation expression, enforcement action taken, and state. | enforcement_action params policy policy_binding state validation_expression |
None |
apiserver_validating_admission_policy_definition_total | ALPHA | Counter | Validation admission policy count total, labeled by state and enforcement action. | enforcement_action state |
None |
apiserver_watch_cache_events_dispatched_total | ALPHA | Counter | Counter of events dispatched in watch cache broken by resource type. | resource |
None |
apiserver_watch_cache_initializations_total | ALPHA | Counter | Counter of watch cache initializations broken by resource type. | resource |
None |
apiserver_watch_events_sizes | ALPHA | Histogram | Watch event size distribution in bytes | group kind version |
None |
apiserver_watch_events_total | ALPHA | Counter | Number of events sent in watch clients | group kind version |
None |
apiserver_webhooks_x509_insecure_sha1_total | ALPHA | Counter | Counts the number of requests to servers with insecure SHA1 signatures in their serving certificate OR the number of connection failures due to the insecure SHA1 signatures (either/or, based on the runtime environment) | None | None |
apiserver_webhooks_x509_missing_san_total | ALPHA | Counter | Counts the number of requests to servers missing SAN extension in their serving certificate OR the number of connection failures due to the lack of x509 certificate SAN extension missing (either/or, based on the runtime environment) | None | None |
attachdetach_controller_forced_detaches | ALPHA | Counter | Number of times the A/D Controller performed a forced detach | None | None |
attachdetach_controller_total_volumes | ALPHA | Custom | Number of volumes in A/D Controller | plugin_name state |
None |
authenticated_user_requests | ALPHA | Counter | Counter of authenticated requests broken out by username. | username |
None |
authentication_attempts | ALPHA | Counter | Counter of authenticated attempts. | result |
None |
authentication_duration_seconds | ALPHA | Histogram | Authentication duration in seconds broken out by result. | result |
None |
authentication_token_cache_active_fetch_count | ALPHA | Gauge | status |
None | |
authentication_token_cache_fetch_total | ALPHA | Counter | status |
None | |
authentication_token_cache_request_duration_seconds | ALPHA | Histogram | status |
None | |
authentication_token_cache_request_total | ALPHA | Counter | status |
None | |
cloudprovider_aws_api_request_duration_seconds | ALPHA | Histogram | Latency of AWS API calls | request |
None |
cloudprovider_aws_api_request_errors | ALPHA | Counter | AWS API errors | request |
None |
cloudprovider_aws_api_throttled_requests_total | ALPHA | Counter | AWS API throttled requests | operation_name |
None |
cloudprovider_azure_api_request_duration_seconds | ALPHA | Histogram | Latency of an Azure API call | request resource_group source subscription_id |
None |
cloudprovider_azure_api_request_errors | ALPHA | Counter | Number of errors for an Azure API call | request resource_group source subscription_id |
None |
cloudprovider_azure_api_request_ratelimited_count | ALPHA | Counter | Number of rate limited Azure API calls | request resource_group source subscription_id |
None |
cloudprovider_azure_api_request_throttled_count | ALPHA | Counter | Number of throttled Azure API calls | request resource_group source subscription_id |
None |
cloudprovider_azure_op_duration_seconds | ALPHA | Histogram | Latency of an Azure service operation | request resource_group source subscription_id |
None |
cloudprovider_azure_op_failure_count | ALPHA | Counter | Number of failed Azure service operations | request resource_group source subscription_id |
None |
cloudprovider_gce_api_request_duration_seconds | ALPHA | Histogram | Latency of a GCE API call | region request version zone |
None |
cloudprovider_gce_api_request_errors | ALPHA | Counter | Number of errors for an API call | region request version zone |
None |
cloudprovider_vsphere_api_request_duration_seconds | ALPHA | Histogram | Latency of vsphere api call | request |
None |
cloudprovider_vsphere_api_request_errors | ALPHA | Counter | vsphere Api errors | request |
None |
cloudprovider_vsphere_operation_duration_seconds | ALPHA | Histogram | Latency of vsphere operation call | operation |
None |
cloudprovider_vsphere_operation_errors | ALPHA | Counter | vsphere operation errors | operation |
None |
cloudprovider_vsphere_vcenter_versions | ALPHA | Custom | Versions for connected vSphere vCenters | hostname version build |
None |
container_cpu_usage_seconds_total | ALPHA | Custom | Cumulative cpu time consumed by the container in core-seconds | container pod namespace |
None |
container_memory_working_set_bytes | ALPHA | Custom | Current working set of the container in bytes | container pod namespace |
None |
container_start_time_seconds | ALPHA | Custom | Start time of the container since unix epoch in seconds | container pod namespace |
None |
cronjob_controller_cronjob_job_creation_skew_duration_seconds | ALPHA | Histogram | Time between when a cronjob is scheduled to be run, and when the corresponding job is created | None | None |
csi_operations_seconds | ALPHA | Histogram | Container Storage Interface operation duration with gRPC error code status total | driver_name grpc_status_code method_name migrated |
None |
endpoint_slice_controller_changes | ALPHA | Counter | Number of EndpointSlice changes | operation |
None |
endpoint_slice_controller_desired_endpoint_slices | ALPHA | Gauge | Number of EndpointSlices that would exist with perfect endpoint allocation | None | None |
endpoint_slice_controller_endpoints_added_per_sync | ALPHA | Histogram | Number of endpoints added on each Service sync | None | None |
endpoint_slice_controller_endpoints_desired | ALPHA | Gauge | Number of endpoints desired | None | None |
endpoint_slice_controller_endpoints_removed_per_sync | ALPHA | Histogram | Number of endpoints removed on each Service sync | None | None |
endpoint_slice_controller_endpointslices_changed_per_sync | ALPHA | Histogram | Number of EndpointSlices changed on each Service sync | topology |
None |
endpoint_slice_controller_num_endpoint_slices | ALPHA | Gauge | Number of EndpointSlices | None | None |
endpoint_slice_controller_syncs | ALPHA | Counter | Number of EndpointSlice syncs | result |
None |
endpoint_slice_mirroring_controller_addresses_skipped_per_sync | ALPHA | Histogram | Number of addresses skipped on each Endpoints sync due to being invalid or exceeding MaxEndpointsPerSubset | None | None |
endpoint_slice_mirroring_controller_changes | ALPHA | Counter | Number of EndpointSlice changes | operation |
None |
endpoint_slice_mirroring_controller_desired_endpoint_slices | ALPHA | Gauge | Number of EndpointSlices that would exist with perfect endpoint allocation | None | None |
endpoint_slice_mirroring_controller_endpoints_added_per_sync | ALPHA | Histogram | Number of endpoints added on each Endpoints sync | None | None |
endpoint_slice_mirroring_controller_endpoints_desired | ALPHA | Gauge | Number of endpoints desired | None | None |
endpoint_slice_mirroring_controller_endpoints_removed_per_sync | ALPHA | Histogram | Number of endpoints removed on each Endpoints sync | None | None |
endpoint_slice_mirroring_controller_endpoints_sync_duration | ALPHA | Histogram | Duration of syncEndpoints() in seconds | None | None |
endpoint_slice_mirroring_controller_endpoints_updated_per_sync | ALPHA | Histogram | Number of endpoints updated on each Endpoints sync | None | None |
endpoint_slice_mirroring_controller_num_endpoint_slices | ALPHA | Gauge | Number of EndpointSlices | None | None |
ephemeral_volume_controller_create_failures_total | ALPHA | Counter | Number of PersistenVolumeClaims creation requests | None | None |
ephemeral_volume_controller_create_total | ALPHA | Counter | Number of PersistenVolumeClaims creation requests | None | None |
etcd_bookmark_counts | ALPHA | Gauge | Number of etcd bookmarks (progress notify events) split by kind. | resource |
None |
etcd_lease_object_counts | ALPHA | Histogram | Number of objects attached to a single etcd lease. | None | None |
etcd_request_duration_seconds | ALPHA | Histogram | Etcd request latency in seconds for each operation and object type. | operation type |
None |
etcd_version_info | ALPHA | Gauge | Etcd server's binary version | binary_version |
None |
field_validation_request_duration_seconds | ALPHA | Histogram | Response latency distribution in seconds for each field validation value and whether field validation is enabled or not | enabled field_validation |
None |
garbagecollector_controller_resources_sync_error_total | ALPHA | Counter | Number of garbage collector resources sync errors | None | None |
get_token_count | ALPHA | Counter | Counter of total Token() requests to the alternate token source | None | None |
get_token_fail_count | ALPHA | Counter | Counter of failed Token() requests to the alternate token source | None | None |
job_controller_job_finished_total | ALPHA | Counter | The number of finished job | completion_mode reason result |
None |
job_controller_job_pods_finished_total | ALPHA | Counter | The number of finished Pods that are fully tracked | completion_mode result |
None |
job_controller_job_sync_duration_seconds | ALPHA | Histogram | The time it took to sync a job | action completion_mode result |
None |
job_controller_job_sync_total | ALPHA | Counter | The number of job syncs | action completion_mode result |
None |
job_controller_pod_failures_handled_by_failure_policy_total | ALPHA | Counter | `The number of failed Pods handled by failure policy with, respect to the failure policy action applied based on the matched, rule. Possible values of the action label correspond to the, possible values for the failure policy rule action, which are:, "FailJob", "Ignore" and "Count".` | action |
None |
job_controller_terminated_pods_tracking_finalizer_total | ALPHA | Counter | `The number of terminated pods (phase=Failed|Succeeded), that have the finalizer batch.kubernetes.io/job-tracking, The event label can be "add" or "delete".` | event |
None |
kube_apiserver_clusterip_allocator_allocated_ips | ALPHA | Gauge | Gauge measuring the number of allocated IPs for Services | cidr |
None |
kube_apiserver_clusterip_allocator_allocation_errors_total | ALPHA | Counter | Number of errors trying to allocate Cluster IPs | cidr scope |
None |
kube_apiserver_clusterip_allocator_allocation_total | ALPHA | Counter | Number of Cluster IPs allocations | cidr scope |
None |
kube_apiserver_clusterip_allocator_available_ips | ALPHA | Gauge | Gauge measuring the number of available IPs for Services | cidr |
None |
kube_apiserver_pod_logs_pods_logs_backend_tls_failure_total | ALPHA | Counter | Total number of requests for pods/logs that failed due to kubelet server TLS verification | None | None |
kube_apiserver_pod_logs_pods_logs_insecure_backend_total | ALPHA | Counter | Total number of requests for pods/logs sliced by usage type: enforce_tls, skip_tls_allowed, skip_tls_denied | usage |
None |
kube_pod_resource_limit | ALPHA | Custom | Resources limit for workloads on the cluster, broken down by pod. This shows the resource usage the scheduler and kubelet expect per pod for resources along with the unit for the resource if any. | namespace pod node scheduler priority resource unit |
None |
kube_pod_resource_request | ALPHA | Custom | Resources requested by workloads on the cluster, broken down by pod. This shows the resource usage the scheduler and kubelet expect per pod for resources along with the unit for the resource if any. | namespace pod node scheduler priority resource unit |
None |
kubelet_certificate_manager_client_expiration_renew_errors | ALPHA | Counter | Counter of certificate renewal errors. | None | None |
kubelet_certificate_manager_client_ttl_seconds | ALPHA | Gauge | Gauge of the TTL (time-to-live) of the Kubelet's client certificate. The value is in seconds until certificate expiry (negative if already expired). If client certificate is invalid or unused, the value will be +INF. | None | None |
kubelet_certificate_manager_server_rotation_seconds | ALPHA | Histogram | Histogram of the number of seconds the previous certificate lived before being rotated. | None | None |
kubelet_certificate_manager_server_ttl_seconds | ALPHA | Gauge | Gauge of the shortest TTL (time-to-live) of the Kubelet's serving certificate. The value is in seconds until certificate expiry (negative if already expired). If serving certificate is invalid or unused, the value will be +INF. | None | None |
kubelet_cgroup_manager_duration_seconds | ALPHA | Histogram | Duration in seconds for cgroup manager operations. Broken down by method. | operation_type |
None |
kubelet_container_log_filesystem_used_bytes | ALPHA | Custom | Bytes used by the container's logs on the filesystem. | uid namespace pod container |
None |
kubelet_containers_per_pod_count | ALPHA | Histogram | The number of containers per pod. | None | None |
kubelet_cpu_manager_pinning_errors_total | ALPHA | Counter | The number of cpu core allocations which required pinning failed. | None | None |
kubelet_cpu_manager_pinning_requests_total | ALPHA | Counter | The number of cpu core allocations which required pinning. | None | None |
kubelet_device_plugin_alloc_duration_seconds | ALPHA | Histogram | Duration in seconds to serve a device plugin Allocation request. Broken down by resource name. | resource_name |
None |
kubelet_device_plugin_registration_total | ALPHA | Counter | Cumulative number of device plugin registrations. Broken down by resource name. | resource_name |
None |
kubelet_eviction_stats_age_seconds | ALPHA | Histogram | Time between when stats are collected, and when pod is evicted based on those stats by eviction signal | eviction_signal |
None |
kubelet_evictions | ALPHA | Counter | Cumulative number of pod evictions by eviction signal | eviction_signal |
None |
kubelet_graceful_shutdown_end_time_seconds | ALPHA | Gauge | Last graceful shutdown start time since unix epoch in seconds | None | None |
kubelet_graceful_shutdown_start_time_seconds | ALPHA | Gauge | Last graceful shutdown start time since unix epoch in seconds | None | None |
kubelet_http_inflight_requests | ALPHA | Gauge | Number of the inflight http requests | long_running method path server_type |
None |
kubelet_http_requests_duration_seconds | ALPHA | Histogram | Duration in seconds to serve http requests | long_running method path server_type |
None |
kubelet_http_requests_total | ALPHA | Counter | Number of the http requests received since the server started | long_running method path server_type |
None |
kubelet_kubelet_credential_provider_plugin_duration | ALPHA | Histogram | Duration of execution in seconds for credential provider plugin | plugin_name |
None |
kubelet_kubelet_credential_provider_plugin_errors | ALPHA | Counter | Number of errors from credential provider plugin | plugin_name |
None |
kubelet_lifecycle_handler_http_fallbacks_total | ALPHA | Counter | The number of times lifecycle handlers successfully fell back to http from https. | None | None |
kubelet_managed_ephemeral_containers | ALPHA | Gauge | Current number of ephemeral containers in pods managed by this kubelet. Ephemeral containers will be ignored if disabled by the EphemeralContainers feature gate, and this number will be 0. | None | None |
kubelet_node_name | ALPHA | Gauge | The node's name. The count is always 1. | node |
None |
kubelet_pleg_discard_events | ALPHA | Counter | The number of discard events in PLEG. | None | None |
kubelet_pleg_last_seen_seconds | ALPHA | Gauge | Timestamp in seconds when PLEG was last seen active. | None | None |
kubelet_pleg_relist_duration_seconds | ALPHA | Histogram | Duration in seconds for relisting pods in PLEG. | None | None |
kubelet_pleg_relist_interval_seconds | ALPHA | Histogram | Interval in seconds between relisting in PLEG. | None | None |
kubelet_pod_resources_endpoint_errors_get_allocatable | ALPHA | Counter | Number of requests to the PodResource GetAllocatableResources endpoint which returned error. Broken down by server api version. | server_api_version |
None |
kubelet_pod_resources_endpoint_errors_list | ALPHA | Counter | Number of requests to the PodResource List endpoint which returned error. Broken down by server api version. | server_api_version |
None |
kubelet_pod_resources_endpoint_requests_get_allocatable | ALPHA | Counter | Number of requests to the PodResource GetAllocatableResources endpoint. Broken down by server api version. | server_api_version |
None |
kubelet_pod_resources_endpoint_requests_list | ALPHA | Counter | Number of requests to the PodResource List endpoint. Broken down by server api version. | server_api_version |
None |
kubelet_pod_resources_endpoint_requests_total | ALPHA | Counter | Cumulative number of requests to the PodResource endpoint. Broken down by server api version. | server_api_version |
None |
kubelet_pod_start_duration_seconds | ALPHA | Histogram | Duration in seconds from kubelet seeing a pod for the first time to the pod starting to run | None | None |
kubelet_pod_status_sync_duration_seconds | ALPHA | Histogram | Duration in seconds to sync a pod status update. Measures time from detection of a change to pod status until the API is successfully updated for that pod, even if multiple intevening changes to pod status occur. | None | None |
kubelet_pod_worker_duration_seconds | ALPHA | Histogram | Duration in seconds to sync a single pod. Broken down by operation type: create, update, or sync | operation_type |
None |
kubelet_pod_worker_start_duration_seconds | ALPHA | Histogram | Duration in seconds from kubelet seeing a pod to starting a worker. | None | None |
kubelet_preemptions | ALPHA | Counter | Cumulative number of pod preemptions by preemption resource | preemption_signal |
None |
kubelet_run_podsandbox_duration_seconds | ALPHA | Histogram | Duration in seconds of the run_podsandbox operations. Broken down by RuntimeClass.Handler. | runtime_handler |
None |
kubelet_run_podsandbox_errors_total | ALPHA | Counter | Cumulative number of the run_podsandbox operation errors by RuntimeClass.Handler. | runtime_handler |
None |
kubelet_running_containers | ALPHA | Gauge | Number of containers currently running | container_state |
None |
kubelet_running_pods | ALPHA | Gauge | Number of pods that have a running pod sandbox | None | None |
kubelet_runtime_operations_duration_seconds | ALPHA | Histogram | Duration in seconds of runtime operations. Broken down by operation type. | operation_type |
None |
kubelet_runtime_operations_errors_total | ALPHA | Counter | Cumulative number of runtime operation errors by operation type. | operation_type |
None |
kubelet_runtime_operations_total | ALPHA | Counter | Cumulative number of runtime operations by operation type. | operation_type |
None |
kubelet_server_expiration_renew_errors | ALPHA | Counter | Counter of certificate renewal errors. | None | None |
kubelet_started_containers_errors_total | ALPHA | Counter | Cumulative number of errors when starting containers | code container_type |
None |
kubelet_started_containers_total | ALPHA | Counter | Cumulative number of containers started | container_type |
None |
kubelet_started_host_process_containers_errors_total | ALPHA | Counter | Cumulative number of errors when starting hostprocess containers. This metric will only be collected on Windows and requires WindowsHostProcessContainers feature gate to be enabled. | code container_type |
None |
kubelet_started_host_process_containers_total | ALPHA | Counter | Cumulative number of hostprocess containers started. This metric will only be collected on Windows and requires WindowsHostProcessContainers feature gate to be enabled. | container_type |
None |
kubelet_started_pods_errors_total | ALPHA | Counter | Cumulative number of errors when starting pods | None | None |
kubelet_started_pods_total | ALPHA | Counter | Cumulative number of pods started | None | None |
kubelet_volume_metric_collection_duration_seconds | ALPHA | Histogram | Duration in seconds to calculate volume stats | metric_source |
None |
kubelet_volume_stats_available_bytes | ALPHA | Custom | Number of available bytes in the volume | namespace persistentvolumeclaim |
None |
kubelet_volume_stats_capacity_bytes | ALPHA | Custom | Capacity in bytes of the volume | namespace persistentvolumeclaim |
None |
kubelet_volume_stats_health_status_abnormal | ALPHA | Custom | Abnormal volume health status. The count is either 1 or 0. 1 indicates the volume is unhealthy, 0 indicates volume is healthy | namespace persistentvolumeclaim |
None |
kubelet_volume_stats_inodes | ALPHA | Custom | Maximum number of inodes in the volume | namespace persistentvolumeclaim |
None |
kubelet_volume_stats_inodes_free | ALPHA | Custom | Number of free inodes in the volume | namespace persistentvolumeclaim |
None |
kubelet_volume_stats_inodes_used | ALPHA | Custom | Number of used inodes in the volume | namespace persistentvolumeclaim |
None |
kubelet_volume_stats_used_bytes | ALPHA | Custom | Number of used bytes in the volume | namespace persistentvolumeclaim |
None |
kubeproxy_network_programming_duration_seconds | ALPHA | Histogram | In Cluster Network Programming Latency in seconds | None | None |
kubeproxy_sync_proxy_rules_duration_seconds | ALPHA | Histogram | SyncProxyRules latency in seconds | None | None |
kubeproxy_sync_proxy_rules_endpoint_changes_pending | ALPHA | Gauge | Pending proxy rules Endpoint changes | None | None |
kubeproxy_sync_proxy_rules_endpoint_changes_total | ALPHA | Counter | Cumulative proxy rules Endpoint changes | None | None |
kubeproxy_sync_proxy_rules_iptables_restore_failures_total | ALPHA | Counter | Cumulative proxy iptables restore failures | None | None |
kubeproxy_sync_proxy_rules_iptables_total | ALPHA | Gauge | Number of proxy iptables rules programmed | table |
None |
kubeproxy_sync_proxy_rules_last_queued_timestamp_seconds | ALPHA | Gauge | The last time a sync of proxy rules was queued | None | None |
kubeproxy_sync_proxy_rules_last_timestamp_seconds | ALPHA | Gauge | The last time proxy rules were successfully synced | None | None |
kubeproxy_sync_proxy_rules_no_local_endpoints_total | ALPHA | Gauge | Number of services with a Local traffic policy and no endpoints | traffic_policy |
None |
kubeproxy_sync_proxy_rules_service_changes_pending | ALPHA | Gauge | Pending proxy rules Service changes | None | None |
kubeproxy_sync_proxy_rules_service_changes_total | ALPHA | Counter | Cumulative proxy rules Service changes | None | None |
kubernetes_build_info | ALPHA | Gauge | A metric with a constant '1' value labeled by major, minor, git version, git commit, git tree state, build date, Go version, and compiler from which Kubernetes was built, and platform on which it is running. | build_date compiler git_commit git_tree_state git_version go_version major minor platform |
None |
kubernetes_feature_enabled | ALPHA | Gauge | This metric records the data about the stage and enablement of a k8s feature. | name stage |
None |
kubernetes_healthcheck | ALPHA | Gauge | This metric records the result of a single healthcheck. | name type |
None |
kubernetes_healthchecks_total | ALPHA | Counter | This metric records the results of all healthcheck. | name status type |
None |
leader_election_master_status | ALPHA | Gauge | Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. 'name' is the string used to identify the lease. Please make sure to group by name. | name |
None |
node_authorizer_graph_actions_duration_seconds | ALPHA | Histogram | Histogram of duration of graph actions in node authorizer. | operation |
None |
node_collector_evictions_number | ALPHA | Counter | Number of Node evictions that happened since current instance of NodeController started, This metric is replaced by node_collector_evictions_total. | zone |
None |
node_collector_unhealthy_nodes_in_zone | ALPHA | Gauge | Gauge measuring number of not Ready Nodes per zones. | zone |
None |
node_collector_zone_health | ALPHA | Gauge | Gauge measuring percentage of healthy nodes per zone. | zone |
None |
node_collector_zone_size | ALPHA | Gauge | Gauge measuring number of registered Nodes per zones. | zone |
None |
node_cpu_usage_seconds_total | ALPHA | Custom | Cumulative cpu time consumed by the node in core-seconds | None | None |
node_ipam_controller_cidrset_allocation_tries_per_request | ALPHA | Histogram | Number of endpoints added on each Service sync | clusterCIDR |
None |
node_ipam_controller_cidrset_cidrs_allocations_total | ALPHA | Counter | Counter measuring total number of CIDR allocations. | clusterCIDR |
None |
node_ipam_controller_cidrset_cidrs_releases_total | ALPHA | Counter | Counter measuring total number of CIDR releases. | clusterCIDR |
None |
node_ipam_controller_cidrset_usage_cidrs | ALPHA | Gauge | Gauge measuring percentage of allocated CIDRs. | clusterCIDR |
None |
node_ipam_controller_multicidrset_allocation_tries_per_request | ALPHA | Histogram | Histogram measuring CIDR allocation tries per request. | clusterCIDR |
None |
node_ipam_controller_multicidrset_cidrs_allocations_total | ALPHA | Counter | Counter measuring total number of CIDR allocations. | clusterCIDR |
None |
node_ipam_controller_multicidrset_cidrs_releases_total | ALPHA | Counter | Counter measuring total number of CIDR releases. | clusterCIDR |
None |
node_ipam_controller_multicidrset_usage_cidrs | ALPHA | Gauge | Gauge measuring percentage of allocated CIDRs. | clusterCIDR |
None |
node_memory_working_set_bytes | ALPHA | Custom | Current working set of the node in bytes | None | None |
number_of_l4_ilbs | ALPHA | Gauge | Number of L4 ILBs | feature |
None |
plugin_manager_total_plugins | ALPHA | Custom | Number of plugins in Plugin Manager | socket_path state |
None |
pod_cpu_usage_seconds_total | ALPHA | Custom | Cumulative cpu time consumed by the pod in core-seconds | pod namespace |
None |
pod_memory_working_set_bytes | ALPHA | Custom | Current working set of the pod in bytes | pod namespace |
None |
pod_security_errors_total | ALPHA | Counter | Number of errors preventing normal evaluation. Non-fatal errors may result in the latest restricted profile being used for evaluation. | fatal request_operation resource subresource |
None |
pod_security_evaluations_total | ALPHA | Counter | Number of policy evaluations that occurred, not counting ignored or exempt requests. | decision mode policy_level policy_version request_operation resource subresource |
None |
pod_security_exemptions_total | ALPHA | Counter | Number of exempt requests, not counting ignored or out of scope requests. | request_operation resource subresource |
None |
prober_probe_duration_seconds | ALPHA | Histogram | Duration in seconds for a probe response. | container namespace pod probe_type |
None |
prober_probe_total | ALPHA | Counter | Cumulative number of a liveness, readiness or startup probe for a container by result. | container namespace pod pod_uid probe_type result |
None |
pv_collector_bound_pv_count | ALPHA | Custom | Gauge measuring number of persistent volume currently bound | storage_class |
None |
pv_collector_bound_pvc_count | ALPHA | Custom | Gauge measuring number of persistent volume claim currently bound | namespace |
None |
pv_collector_total_pv_count | ALPHA | Custom | Gauge measuring total number of persistent volumes | plugin_name volume_mode |
None |
pv_collector_unbound_pv_count | ALPHA | Custom | Gauge measuring number of persistent volume currently unbound | storage_class |
None |
pv_collector_unbound_pvc_count | ALPHA | Custom | Gauge measuring number of persistent volume claim currently unbound | namespace |
None |
replicaset_controller_sorting_deletion_age_ratio | ALPHA | Histogram | The ratio of chosen deleted pod's ages to the current youngest pod's age (at the time). Should be <2.The intent of this metric is to measure the rough efficacy of the LogarithmicScaleDown feature gate's effect onthe sorting (and deletion) of pods when a replicaset scales down. This only considers Ready pods when calculating and reporting. | None | None |
rest_client_exec_plugin_call_total | ALPHA | Counter | Number of calls to an exec plugin, partitioned by the type of event encountered (no_error, plugin_execution_error, plugin_not_found_error, client_internal_error) and an optional exit code. The exit code will be set to 0 if and only if the plugin call was successful. | call_status code |
None |
rest_client_exec_plugin_certificate_rotation_age | ALPHA | Histogram | Histogram of the number of seconds the last auth exec plugin client certificate lived before being rotated. If auth exec plugin client certificates are unused, histogram will contain no data. | None | None |
rest_client_exec_plugin_ttl_seconds | ALPHA | Gauge | Gauge of the shortest TTL (time-to-live) of the client certificate(s) managed by the auth exec plugin. The value is in seconds until certificate expiry (negative if already expired). If auth exec plugins are unused or manage no TLS certificates, the value will be +INF. | None | None |
rest_client_rate_limiter_duration_seconds | ALPHA | Histogram | Client side rate limiter latency in seconds. Broken down by verb, and host. | host verb |
None |
rest_client_request_duration_seconds | ALPHA | Histogram | Request latency in seconds. Broken down by verb, and host. | host verb |
None |
rest_client_request_size_bytes | ALPHA | Histogram | Request size in bytes. Broken down by verb and host. | host verb |
None |
rest_client_requests_total | ALPHA | Counter | Number of HTTP requests, partitioned by status code, method, and host. | code host method |
None |
rest_client_response_size_bytes | ALPHA | Histogram | Response size in bytes. Broken down by verb and host. | host verb |
None |
retroactive_storageclass_errors_total | ALPHA | Counter | Total number of failed retroactive StorageClass assignments to persistent volume claim | None | None |
retroactive_storageclass_total | ALPHA | Counter | Total number of retroactive StorageClass assignments to persistent volume claim | None | None |
root_ca_cert_publisher_sync_duration_seconds | ALPHA | Histogram | Number of namespace syncs happened in root ca cert publisher. | code |
None |
root_ca_cert_publisher_sync_total | ALPHA | Counter | Number of namespace syncs happened in root ca cert publisher. | code |
None |
running_managed_controllers | ALPHA | Gauge | Indicates where instances of a controller are currently running | manager name |
None |
scheduler_e2e_scheduling_duration_seconds | ALPHA | Histogram | E2e scheduling latency in seconds (scheduling algorithm + binding). This metric is replaced by scheduling_attempt_duration_seconds. | profile result |
None |
scheduler_goroutines | ALPHA | Gauge | Number of running goroutines split by the work they do such as binding. | operation |
None |
scheduler_permit_wait_duration_seconds | ALPHA | Histogram | Duration of waiting on permit. | result |
None |
scheduler_plugin_execution_duration_seconds | ALPHA | Histogram | Duration for running a plugin at a specific extension point. | extension_point plugin status |
None |
scheduler_scheduler_cache_size | ALPHA | Gauge | Number of nodes, pods, and assumed (bound) pods in the scheduler cache. | type |
None |
scheduler_scheduler_goroutines | ALPHA | Gauge | Number of running goroutines split by the work they do such as binding. This metric is replaced by the \"goroutines\" metric. | work |
None |
scheduler_scheduling_algorithm_duration_seconds | ALPHA | Histogram | Scheduling algorithm latency in seconds | None | None |
scheduler_unschedulable_pods | ALPHA | Gauge | The number of unschedulable pods broken down by plugin name. A pod will increment the gauge for all plugins that caused it to not schedule and so this metric have meaning only when broken down by plugin. | plugin profile |
None |
scheduler_volume_binder_cache_requests_total | ALPHA | Counter | Total number for request volume binding cache | operation |
None |
scheduler_volume_scheduling_stage_error_total | ALPHA | Counter | Volume scheduling stage error count | operation |
None |
scrape_error | ALPHA | Custom | 1 if there was an error while getting container metrics, 0 otherwise | None | None |
service_controller_nodesync_latency_seconds | ALPHA | Histogram | A metric measuring the latency for nodesync which updates loadbalancer hosts on cluster node updates. | None | None |
service_controller_update_loadbalancer_host_latency_seconds | ALPHA | Histogram | A metric measuring the latency for updating each load balancer hosts. | None | None |
serviceaccount_legacy_tokens_total | ALPHA | Counter | Cumulative legacy service account tokens used | None | None |
serviceaccount_stale_tokens_total | ALPHA | Counter | Cumulative stale projected service account tokens used | None | None |
serviceaccount_valid_tokens_total | ALPHA | Counter | Cumulative valid projected service account tokens used | None | None |
storage_count_attachable_volumes_in_use | ALPHA | Custom | Measure number of volumes in use | node volume_plugin |
None |
storage_operation_duration_seconds | ALPHA | Histogram | Storage operation duration | migrated operation_name status volume_plugin |
None |
ttl_after_finished_controller_job_deletion_duration_seconds | ALPHA | Histogram | The time it took to delete the job since it became eligible for deletion | None | None |
volume_manager_selinux_container_errors_total | ALPHA | Gauge | Number of errors when kubelet cannot compute SELinux context for a container. Kubelet can't start such a Pod then and it will retry, therefore value of this metric may not represent the actual nr. of containers. | None | None |
volume_manager_selinux_container_warnings_total | ALPHA | Gauge | Number of errors when kubelet cannot compute SELinux context for a container that are ignored. They will become real errors when SELinuxMountReadWriteOncePod feature is expanded to all volume access modes. | None | None |
volume_manager_selinux_pod_context_mismatch_errors_total | ALPHA | Gauge | Number of errors when a Pod defines different SELinux contexts for its containers that use the same volume. Kubelet can't start such a Pod then and it will retry, therefore value of this metric may not represent the actual nr. of Pods. | None | None |
volume_manager_selinux_pod_context_mismatch_warnings_total | ALPHA | Gauge | Number of errors when a Pod defines different SELinux contexts for its containers that use the same volume. They are not errors yet, but they will become real errors when SELinuxMountReadWriteOncePod feature is expanded to all volume access modes. | None | None |
volume_manager_selinux_volume_context_mismatch_errors_total | ALPHA | Gauge | Number of errors when a Pod uses a volume that is already mounted with a different SELinux context than the Pod needs. Kubelet can't start such a Pod then and it will retry, therefore value of this metric may not represent the actual nr. of Pods. | None | None |
volume_manager_selinux_volume_context_mismatch_warnings_total | ALPHA | Gauge | Number of errors when a Pod uses a volume that is already mounted with a different SELinux context than the Pod needs. They are not errors yet, but they will become real errors when SELinuxMountReadWriteOncePod feature is expanded to all volume access modes. | None | None |
volume_manager_selinux_volumes_admitted_total | ALPHA | Gauge | Number of volumes whose SELinux context was fine and will be mounted with mount -o context option. | None | None |
volume_manager_total_volumes | ALPHA | Custom | Number of volumes in Volume Manager | plugin_name state |
None |
volume_operation_total_errors | ALPHA | Counter | Total volume operation errors | operation_name plugin_name |
None |
volume_operation_total_seconds | ALPHA | Histogram | Storage operation end to end duration in seconds | operation_name plugin_name |
None |
watch_cache_capacity | ALPHA | Gauge | Total capacity of watch cache broken by resource type. | resource |
None |
watch_cache_capacity_decrease_total | ALPHA | Counter | Total number of watch cache capacity decrease events broken by resource type. | resource |
None |
watch_cache_capacity_increase_total | ALPHA | Counter | Total number of watch cache capacity increase events broken by resource type. | resource |
None |
workqueue_adds_total | ALPHA | Counter | Total number of adds handled by workqueue | name |
None |
workqueue_depth | ALPHA | Gauge | Current depth of workqueue | name |
None |
workqueue_longest_running_processor_seconds | ALPHA | Gauge | How many seconds has the longest running processor for workqueue been running. | name |
None |
workqueue_queue_duration_seconds | ALPHA | Histogram | How long in seconds an item stays in workqueue before being requested. | name |
None |
workqueue_retries_total | ALPHA | Counter | Total number of retries handled by workqueue | name |
None |
workqueue_unfinished_work_seconds | ALPHA | Gauge | How many seconds of work has done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases. | name |
None |
workqueue_work_duration_seconds | ALPHA | Histogram | How long in seconds processing an item from workqueue takes. | name |
None |