Application Scaling

Understand how auto-scaling works and configure it optimally for your application.

How Auto-Scaling Works

ScallerFox uses Kubernetes Horizontal Pod Autoscaler (HPA) to automatically scale your application based on resource usage and traffic.

Kubernetes continuously monitors CPU and Memory usage of all pods

HPA calculates how many pods are needed to maintain target CPU/Memory usage

Pods are created or terminated to match the calculated target, within your min/max bounds

The minimum number of pods that will always be running, even during low traffic.

0 pods

Scale to zero when idle. Saves costs but has cold start delay.

1 pod (Recommended)

Always available. Good for production apps.

2+ pods

High availability. Load distributed across pods.

The maximum number of pods that can be scaled to during traffic spikes.

1-3 pods

Low traffic, cost-optimized

5-10 pods

Medium traffic, balanced

20+ pods

High traffic, maximum capacity

Minimize costs by using fewer pods and allowing scale-to-zero.

Configuration:

Balance between availability and cost.

Configuration:

Maximum availability and capacity for critical applications.

Configuration:

Use the Pod Count chart in Resource Usage to monitor how your application scales:

Scaling Up
Pod count increases during traffic spikes. Watch for patterns to understand peak times.
Scaling Down
Pod count decreases during low traffic. This helps reduce costs.
Hitting Limits
If pod count stays at max, consider increasing max pods or upgrading package.