Proposes Recurrent PPO (RPPO) — an on-policy deep reinforcement learning method with an LSTM policy network — for intelligent autoscaling of serverless functions. Models the autoscaling problem as a Partially Observable Markov Decision Process (POMDP). Benchmarked against DRQN, standard Kubernetes HPA, and other baselines. Achieves 22% throughput improvement and 37% resource wastage reduction vs. standard Kubernetes HPA, outperforming DRQN across workload traces.
+22%
Throughput improvement
−37%
Resource wastage reduction
POMDP
Decision model for partial observability
Evaluated on OpenFaaS with MicroK8s on NeCTAR Melbourne Research Cloud (multi-node clusters).
On-policy deep RL with an LSTM policy network that captures temporal dependencies in serverless workload traces, maintaining hidden state across scaling decisions to model partial observability.
Autoscaling is formulated as a Partially Observable MDP, handling incomplete observability of pod states and request queues in production OpenFaaS deployments.
Custom HPA controller that replaces threshold-based scaling decisions with RL agent actions, integrating directly with OpenFaaS provider API and Prometheus metrics.
On-policy training with the PPO clipped surrogate objective enables stable policy updates across diverse serverless traffic patterns. Outperforms the off-policy DRQN baseline across all evaluated workload traces.
A Deep Recurrent-Reinforcement Learning Method for Intelligent AutoScaling of Serverless Functions
Agarwal, S. et al. — DOI: 10.1109/TSC.2024.3387661
View on IEEE Xplore