Smarter Scaling with Kubernetes Vertical Pod Autoscaler (VPA)

Blogs > DevOps > Enable Kubernetes Vertical...

Enable Kubernetes Vertical Pod Autoscaler (VPA) for Smarter Resource Allocation.

In the world of cloud native applications, resource allocation plays a crucial role in performance, cost efficiency, and stability. Kubernetes has powerful builtin tools to help us manage this more intelligently and one such gem is the Vertical Pod Autoscaler (VPA).

If you have ever faced the headache of out-of-memory (OOM) kills, underperforming pods, or over provisioned resources spiking your cloud bills, VPA is for you.

What is Vertical Pod Autoscaler (VPA)?

Vertical Pod Autoscaler is a Kubernetes component that helps right size your pods by:

Recommending CPU and memory resource requests.
Automatically updating pods with optimal resource values.
Preventing over- and under-provisioning.

It observes the real-time behavior of your workloads and adjusts resource allocation accordingly.

How Vertical Pod Autoscaler (VPA) Works?

VPA collects usage data via the Metrics Server and provides recommendations based on:

Historical usage patterns
Real-time resource consumption
Application behavior over time

It can then apply changes to pod specs by evicting and restarting them with better resource configurations.

Modes of Operation

VPA supports three modes:

1. "Off"

In this mode, VPA collects real-time data and analyzes CPU and memory usage but does not apply any changes to the running pods. It's useful when you want to monitor and review recommendations manually without affecting the existing workload.

2. "Initial"

This mode applies the recommended resource values only at the time of pod creation. Once the pod is running, VPA will not make further adjustments. It's ideal when you want pods to start with optimized resources but prefer not to have any live updates that might restart the workload.

3. "Auto"

In auto mode, VPA dynamically adjusts resource allocations by evicting and recreating pods with updated CPU and memory values. This ensures your applications are always running with optimal resources and is best suited for production environments where workloads vary over time.

Choose a mode based on how much control or automation you want.

Why Use Vertical Pod Autoscaler (VPA)?

Here is why enabling VPA is a smart move for your Kubernetes workloads:

Accurate resource sizing.
Prevents Out-of-Memory (OOM) kills and throttling.
Reduces cloud infrastructure costs.
Improves app stability and performance.
Minimizes manual resource tuning.

VPA is especially useful for long running, stateful, or batch workloads.

Setting Up Vertical Pod Autoscaler (VPA)

Deploy the VPA Components:

First, apply the official VPA manifest to your cluster. This will install the necessary controllers (recommender, updater, and admission controller).

kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml

Create a VPA Resource for Your Deployment:

Now, create a VerticalPodAutoscaler(VPA) resource to target your specific Deployment (for example, my-app). Below is a sample YAML configuration that sets the VPA to Auto mode, meaning it will automatically update resource requests by evicting and recreating pods.

Once applied, the VPA will monitor and adjust CPU and memory resources for the targeted pods based on their real-time usage.

Difference Between VPA and HPA

Horizontal Pod Autoscaler (HPA):

Scales the number of pods in a Deployment, StatefulSet, or ReplicaSet.
Based on metrics like CPU utilization, memory usage, or custom metrics.
Keeps the pod size (CPU/Memory requests) fixed, but increases or decreases the pod count.
Ideal for stateless applications where scaling out helps handle load (e.g., web servers, APIs).

Vertical Pod Autoscaler (VPA)

Adjusts CPU and memory requests/limits for each pod.
Keeps the number of pods constant, but changes the resource size of each pod.
Based on real-time usage and historical data.
Suitable for stateful or long-running applications where increasing pod count is not ideal (e.g., databases, single-instance workloads).