Application Scaling
Understand how auto-scaling works and configure it optimally for your application.
How Auto-Scaling Works
ScallerFox uses Kubernetes Horizontal Pod Autoscaler (HPA) to automatically scale your application based on resource usage and traffic.
Monitor Metrics
Kubernetes continuously monitors CPU and Memory usage of all pods
Calculate Target
HPA calculates how many pods are needed to maintain target CPU/Memory usage
Scale Up/Down
Pods are created or terminated to match the calculated target, within your min/max bounds
Configuring Min & Max Pods
Minimum Pods
The minimum number of pods that will always be running, even during low traffic.
0 pods
Scale to zero when idle. Saves costs but has cold start delay.
1 pod (Recommended)
Always available. Good for production apps.
2+ pods
High availability. Load distributed across pods.
Maximum Pods
The maximum number of pods that can be scaled to during traffic spikes.
1-3 pods
Low traffic, cost-optimized
5-10 pods
Medium traffic, balanced
20+ pods
High traffic, maximum capacity
Scaling Strategies
Cost-Optimized
Minimize costs by using fewer pods and allowing scale-to-zero.
Configuration:
- • Min Pods: 0 or 1
- • Max Pods: 3-5
- • Best for: Development, staging, low-traffic production
Balanced
Balance between availability and cost.
Configuration:
- • Min Pods: 1-2
- • Max Pods: 5-10
- • Best for: Most production applications
High Availability
Maximum availability and capacity for critical applications.
Configuration:
- • Min Pods: 2+
- • Max Pods: 20+
- • Best for: High-traffic, mission-critical applications
Monitoring Scaling Behavior
Use the Pod Count chart in Resource Usage to monitor how your application scales:
Scaling Up
Pod count increases during traffic spikes. Watch for patterns to understand peak times.
Scaling Down
Pod count decreases during low traffic. This helps reduce costs.
Hitting Limits
If pod count stays at max, consider increasing max pods or upgrading package.