Recent troubleshooting cases from Kubernetes Cluster

2023-01-01

This post is an overview of the things I implemented while handling production issues in the Kubernetes Cluster.

Caution! :None of the solutions I provide below should be considered absolute.

Scenario 1: Too many evicted pods on the cluster

1
2
3
4
5
6
7
8
9
kubectl get pods --all-namespace

NAMESPACE NAME READY STATUS RESTARTS AGE
staging web-app 0/1 Evicted 34 8d
auth auth-app 0/1 Evicted 20 5d2h
infra mongodb-0 0/3 Evicted 50 16d
infra redis-0 0/2 Evicted 100 79m
staging catalogue-app 0/1 Evicted 56 2m

The nodes were running perfectly, I used the jq to probe and discover the following was the error

1
2
3
4
5
6
7
"status": {
"message": "Pod The node had condition: [DiskPressure]. ",
"phase": "Failed",
"reason": "Evicted",
"startTime": "2022-11-09T14:06:37Z"
}

Workarounds

  1. Update Kubelet parameters via its config file.

SSH into your nodes and apply the following, reduce the imageGCHighThresholdPercent and imageGCLowThresholdPercent and restart kubelet

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
sudo yum install -y nano
sudo nano /etc/kubernetes/kubelet/kubelet-config.json

{
"kind": "KubeletConfiguration",
"apiVersion": "kubelet.config.k8s.io/v1beta1",
.
.
.
"imageGCHighThresholdPercent": 70,
"imageGCLowThresholdPercent": 50,
"maxPods": ...
}

sudo service kubelet restart
  1. Verify that the new kubelet garbage collection arguments are in the node configz endpoint
1
2
kubectl get node
kubectl proxy #open a connection to the API server
  1. Check the node configz, open a new terminal, and then run the following command:
1
curl -sSL "http://localhost:8001/api/v1/nodes/<NODE_NAME>/proxy/configz" | python3 -m json.tool`
  1. Delete evicted pods
1
2
kubectl get pods --all-namespaces -o json | jq '.items[] | select(.status.reason!=null) | select(.status.reason | contains("Evicted")) | "kubectl delete pods \(.metadata.name) -n \(.metadata.namespace)"' | xargs -n 1 bash -c

Note

  • The --image-gc-high-threshold : argument defines the percent of disk usage that triggers image garbage collection. The default is 85%.
  • The --image-gc-low-threshold : argument defines the percent of disk usage that image garbage collection tries to free. The default is 80%.

Scenario 2: Kubelet System OOM encountered

  1. SSH into the nodes and check the task consuming the most memory
1
ps -eo pid,ppid,cmd,comm,%mem,%cpu --sort=-%mem | head -10

Output:

1
2
3
4
5
6
 PID  PPID  CMD                                                     COMMAND         %MEM %CPU
26483 26454 airflow dags next-execution email_tutorial -n 2 airflow 5.9 5.0
26454 26443 python/lib/usr gunicorn 6.0 3.6
26484 26454 airflow initdb airflow 0.9 22.4


I discovered that airflow was consuming compute resources, I created a dedicated node group for data orchestration tasks with airflow and added tolerations to airflow deployment to tolerate the node.

Workarounds

1
2
kubectl get nodes
kubectl taint nodes <NODE-NAME> reserved=dataTasks:NoSchedule

I deployed my airflow with a Helm package, a simple fix to the values.yaml fixed things

1
2
3
4
5
tolerations:
- key: "reserved"
operator: "Equal"
effect: "NoSchedule"
value: "dataTasks"

Side notes

Diagnosing and Fixing Memory Leaks in Python applications can save your node, we used the following guide to improve our application issues

Scenario 3: Users dealing with insecure access (net::ERR_CERT_AUTHORITY_INVALID)) to our web applications because of unrenewed/unissued SSL/TLS certificates after their expiration.

Workarounds

Set up a Cloud native certificate management X.509 certificate management for Kubernetes

  1. Create a Route53 policy file for cert-manager
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "route53:ChangeResourceRecordSets",
"Resource": "arn:aws:route53:::hostedzone/*"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"route53:GetChange",
"route53:ListHostedZones",
"route53:ListResourceRecordSets",
"route53:ListHostedZonesByName"
],
"Resource": "*"
}
]
}
1
aws iam create-policy --policy-name route53-certmanager-policy --policy-document file://policy.json
  1. Create a service account
1
2
3
4
5
6
7
8
9
10
11
12
13
14
export SERVICE_ACCOUNT=<preferred_name>
export KUBE_NAMESPACE=<preferred_name>
export CLUSTER_NAME=<preferred_name>
export IAM_ROLE_NAME=<preferred_name>

eksctl create iamserviceaccount \
--name $SERVICE_ACCOUNT \
--namespace $KUBE_NAMESPACE \
--cluster $CLUSTER_NAME \
--attach-policy-arn arn:aws:iam::<AWS_ACCOUNT_ID>:policy/route53-certmanager-policy \
--approve \
--role-name $IAM_ROLE_NAME \
--override-existing-serviceaccounts

  1. Install cert-manager

Create a value file

1
2
3
4
5
6
7
8
9
10
11
#values.yaml
installCRDs: true

serviceAccount:
create: false
name: $SERVICE_ACCOUNT
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<AWS_ACCOUNT_ID>:role/$IAM_ROLE_NAME
automountServiceAccountToken: true
securityContext:
fsGroup: 1001
1
2
3
4
5
6
helm repo add jetstack https://charts.jetstack.io
helm upgrade -f values.yaml --install \
cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.10.1
  1. Securing Ingress Resources

A common use-case for cert-manager is requesting TLS signed certificates to secure your ingress resources.

1
2
3
4
5
6
7
8
9
kubectl get cert-manager --all-namespaces
NAMESPACE NAME DENIED READY ISSUER REQUESTOR AGE
hello-app certificaterequest.cert-manager.io/helloa-app-tls-77p42 True True cert-manager-*** system:serviceaccount:cert-manager:** 41m


NAMESPACE NAME READY AGE
clusterissuer.cert-manager.io/cert-manager-<output> True 43d
clusterissuer.cert-manager.io/cert-manager-<output> True 40d
(base) oluchiorji@Oluchis-MacBook-Pro k8s %
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: cert-manager-<output>
name: web-app
namespace: staging
spec:
rules:
- host: oluchiorji.com
http:
paths:
- pathType: Prefix
path: /
backend:
service:
name: web-app
port:
number: 80
tls: # < placing a host in the TLS config will determine what ends up in the cert's subjectAltNames
- hosts:
- oluchiorji.com
secretName: webapp-cert

Scenario 4: Applications were always in a CrashBackLoopError state because of Error: secret <secret-name> not found

When we deploy application via helm , we notice the following :

Helm adds this prefix sh.helm.release.<version-number>-<app-name> when it creates the external secrets and deployments, not the secret-name we specified via our value files.

Workarounds

At first, we were using Kubernetes opaque secret API to offer quick fix but we encountered the following errors

  • The Secret object is convenient to use but does not support storing or retrieving secret data from external secret management systems such as AWS Secrets Manager or Parameter store.
  • Too many hardcoding and multiple yaml files on local system.
  • Secret YAML couldn’t be added to version control.
    We switched to External Secrets Operator

Workarounds

The goal of External Secrets Operator is to synchronize secrets from external APIs into Kubernetes. ESO is a collection of custom API resources - ExternalSecret, SecretStore and ClusterSecretStore that provide a user-friendly abstraction for the external API that stores and manages the lifecycle of the secrets for you.

  1. Install ESO
1
2
3
4
5
6
7
8
9
10
11
export KUBE_NAMESPACE=<preferred_name>
export CLUSTER_NAME=<EKS_CLUSTERNAME>
export IAM_ROLE_NAME=<iam_role_name>
export SERVICE_ACCOUNT_NAME=<sa-name>
kubectl create namespace $KUBE_NAMESPACE
helm repo add external-secrets https://charts.external-secrets.io

helm install external-secrets \
external-secrets/external-secrets \
-n $KUBE_NAMESPACE \
--set installCRDs=true
  1. Create IAM policies
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"secretsmanager:DescribeSecret",
"secretsmanager:GetResourcePolicy",
"secretsmanager:GetSecretValue",
"secretsmanager:ListSecretVersionIds",
"ssm:DescribeParameters",
"ssm:GetParameter",
"ssm:GetParameters",
"ssm:GetParametersByPath"
],
"Effect": "Allow",
"Resource": "arn:aws:secretsmanager:eu-west-1:<AWS_ACCOUNT_ID>:*"
},
{
"Action": [
"kms:DescribeCustomKeyStores",
"kms:ListKeys",
"kms:ListAliases"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"kms:Decrypt",
"kms:GetKeyRotationStatus",
"kms:GetKeyPolicy",
"kms:DescribeKey"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
1
aws iam create-policy --policy-name secret-policy --policy-document file://policy.json
  1. Create service account
1
2
3
4
5
6
7
8
9
eksctl create iamserviceaccount \
--name $SERVICE_ACCOUNT_NAME \
--namespace $KUBE_NAMESPACE \
--cluster $CLUSTER_NAME \
--attach-policy-arn arn:aws:iam::<AWS_ACCOUNT_ID>:policy/secret-policy \
--approve \
--role-name $IAM_ROLE_NAME \
--override-existing-serviceaccounts

  1. Create the ClusterSecretStore is a global, cluster-wide SecretStore that can be referenced from all namespaces.
    We will create 2 gateways paramaterstore-cluster-secret and secretmanager-cluster-secret in order to access both AWS secret providers.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# parameter-store
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
name: paramaterstore-cluster-secret
spec:
provider:
aws:
service: ParameterStore
region: <AWS_REGION>
auth:
jwt:
serviceAccountRef:
name: $SERVICE_ACCOUNT_NAME
namespace: $KUBE_NAMESPACE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#secret-manager-store
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
name: secretmanager-cluster-secret
spec:
provider:
aws:
service: SecretsManager
region: <AWS_REGION>
auth:
jwt:
serviceAccountRef:
name: $SERVICE_ACCOUNT_NAME
namespace: $KUBE_NAMESPACE
  1. Create fake parameters , I will create a fake staging postgresql DB username and password
1
2
3
4
5
6
7
8
aws ssm put-parameter --name "/ecommerce/staging/postgresql/username" --value "admin" --description "username for staging postgreSQL DB" \
--type SecureString --region $AWS_REGION

aws ssm put-parameter --name "/ecommerce/staging/postgresql/password" --value "4g#4gGGDG9OghjuE" --description "password for staging postgreSQL DB" \
--type SecureString --region $AWS_REGION

aws secretsmanager create-secret --name ecommerce/staging --description "My test secret created with the CLI." \
--secret-string "{\"POSTGRESQL_USER\":\"admin\",\"POSTGRESQL_PASS\":\"4g#4gGGDG9OghjuE\"}"
  1. Use ExternalSecret to fetch the secrets.
    It has a reference to the ClusterSecretStore a global, cluster-wide SecretStore that can be referenced from all namespaces.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# fetch secrets from AWS secrets manager

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: <metatdata_name>
namespace: <namespace>
spec:
refreshInterval: 12h
secretStoreRef:
name: secretmanager-cluster-secret # referenced from clusterstore secrets created for AWS Secret Manager in step 4
kind: ClusterSecretStore
target:
name: <secret_name>
creationPolicy: Owner
data:
- secretKey: POSTGRESQL_PASS
remoteRef:
key: ecommerce/staging
property: POSTGRESQL_PASS
- secretKey: POSTGRESQL_USER
remoteRef:
key: ecommerce/staging
property: POSTGRESQL_USER
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# fetch secrets from AWS Parameter store

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: <metadata_name>
namespace: <namespace>
spec:
refreshInterval: 12h
secretStoreRef:
name: paramaterstore-cluster-secret # referenced from clusterstore secrets created for AWS Parameter Store in step 4
kind: ClusterSecretStore
target:
name: <secret_name>
creationPolicy: Owner
data:
- secretKey: POSTGRESQL_USER
remoteRef:
key: "/ecommerce/staging/postgresql/username"
- secretKey: POSTGRESQL_PASS
remoteRef:
key: "/ecommerce/staging/postgresql/password"
  1. Reference the secret via Deployments
1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion: v1
kind: Pod
metadata:
name: secret-test-pod
spec:
containers:
- name: test-container
image: registry.k8s.io/busybox
command: ["/bin/sh", "-c", "env"]
envFrom:
- secretRef:
name: <secret_name>
restartPolicy: Never

Scenario 5: NodeNotReady

The issues with NodeNotReady could have been fixed effectively with The Kubernetes Cluster Autoscaler but it doesn’t automatically adjust the number of nodes in our cluster when pods fail or are rescheduled onto other nodes, we need to manually create a new node and remove a new node via the console like adjusting the desired size etc.

Some other issues we encountered are:

  • Using very small instances in node groups leads to node groups maxing out and resulting in unscheduled /evicted pods.
  • Using large instances in node groups leads to low resource utilization and increased cost.

We were able to fix the issue with Karpenter
This is a detailed guide depending on what tool ( terraform, eksctl, kOps) you want to use to set up Karpenter.

Scenario 6: OpenSearch Shards Issues

1
"reason"=>"Validation Failed: 1: this action would add [10] total shards, but this cluster currently has [594]/[600] maximum shards open;" was the problem

Workarounds

I wrote a detailed guide on how to delete old indices from ES or OpenSearch using the Python Curator pip package, check it out here.
This task can be wrapped into a Cron task.

Scenario 7: Struggling to fix Kubernetes over-provisioning

We wanted the ability to set appropriate resource requests for pods (applications)deployed in the cluster.
The more precisely we set accurate resources to our pods, the more reliably your applications will run and the more space we’ll save in the cluster.
We installed Kubecost and we had access to a view that look like this

Workarounds

Install Kubecost

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
persistentVolume:
enabled: true
size: "2.0Gi"
dbSize: "4.0Gi"
storageClass: "<storage_class_name_configured_on_your_cluster>"

ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: <ingress_class_name_configured_on_cluster>
kubernetes.io/tls-acme: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
cert-manager.io/cluster-issuer: "<cert_manager_cluster_issuer>"
hosts:
- "<domain-name>"
tls:
- secretName: <preferred_name>
hosts:
- <domain-name>"

networkCosts:
enabled: true
prometheusScrape: true

kubecostToken: "<KUBE_COST_TOKEN>"

global:
grafana:
enabled: false

clusterController:
enabled: true
1
2
3
4
5
export RELEASE_NAME=<preferred_name>
export KUBE_NAMESPACE=<preferred_name>
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm upgrade -f values.yaml --install $RELEASE_NAME kubecost/cost-analyzer --namespace $KUBE_NAMESPACE

References

  1. Configure EKS Worker Nodes Image Cache