Back to Portfolio
RL-Based IEEE Trans. Services Computing · Vol. 17, No. 5 · 2024

DRe-SCale

Proposes Recurrent PPO (RPPO) — an on-policy deep reinforcement learning method with an LSTM policy network — for intelligent autoscaling of serverless functions. Models the autoscaling problem as a Partially Observable Markov Decision Process (POMDP). Benchmarked against DRQN, standard Kubernetes HPA, and other baselines. Achieves 22% throughput improvement and 37% resource wastage reduction vs. standard Kubernetes HPA, outperforming DRQN across workload traces.

PythonPython KubernetesKubernetes OpenFaaSOpenFaaS PyTorchPyTorch PrometheusPrometheus

Performance vs. Standard HPA

+22%

Throughput improvement

−37%

Resource wastage reduction

POMDP

Decision model for partial observability

Evaluated on OpenFaaS with MicroK8s on NeCTAR Melbourne Research Cloud (multi-node clusters).

RPPO Architecture

Recurrent PPO (RPPO)

On-policy deep RL with an LSTM policy network that captures temporal dependencies in serverless workload traces, maintaining hidden state across scaling decisions to model partial observability.

POMDP Environment Modelling

Autoscaling is formulated as a Partially Observable MDP, handling incomplete observability of pod states and request queues in production OpenFaaS deployments.

Kubernetes HPA Extension

Custom HPA controller that replaces threshold-based scaling decisions with RL agent actions, integrating directly with OpenFaaS provider API and Prometheus metrics.

Workload-Adaptive Learning

On-policy training with the PPO clipped surrogate objective enables stable policy updates across diverse serverless traffic patterns. Outperforms the off-policy DRQN baseline across all evaluated workload traces.

Technical Implementation

Core Technologies

  • Recurrent PPO (RPPO) with LSTM policy network (on-policy)
  • OpenFaaS on MicroK8s v1.27.2 (multi-node cluster)
  • NeCTAR Melbourne Research Cloud infrastructure
  • Prometheus metrics collection for observation space construction
  • PyTorch for neural network implementation and training

RL Design

  • State space: queue length, active replicas, CPU utilisation, latency percentiles
  • Action space: scale up, scale down, or hold replica count
  • Reward function: throughput maximisation subject to SLA penalty
  • PPO clipped surrogate objective for stable on-policy updates

Publication

IEEE Trans. Services Computing · Vol. 17, No. 5 · 2024

A Deep Recurrent-Reinforcement Learning Method for Intelligent AutoScaling of Serverless Functions

Agarwal, S. et al. — DOI: 10.1109/TSC.2024.3387661

View on IEEE Xplore