Back to Portfolio
Serverless RL Open Source

FaaSTrainGym

A serverless reinforcement learning training platform that integrates OpenFaaS with OpenAI Gym. Decomposes RL training pipelines (rollout collection, reward aggregation, policy updates) into elastic, cost-efficient serverless functions — enabling scalable RL experimentation without managing long-lived training servers.

OpenFaaSOpenFaaS PythonPython KubernetesKubernetes PyTorchPyTorch

Why FaaSTrainGym?

Elastic Scaling

Burst data collection and evaluation across many function instances during training peaks, then scale to zero outside training windows.

Modular Pipelines

Compose training stages — environment step, rollout collection, reward calculation, policy update — into independently deployable serverless functions.

Cost Efficiency

Pay for compute only when work runs. Scale-to-zero between training bursts eliminates idle GPU/CPU costs common in dedicated training clusters.

Kubernetes-Native Observability

Metrics and logs accessible through the OpenFaaS Gateway and Prometheus stack, simplifying distributed training diagnostics at scale.

Architecture

A serverless MAPE-inspired loop for RL: Measure rollouts, Analyse rewards, Plan updates, Execute policy changes — each stage implemented as an OpenFaaS function.

Control Plane

Orchestrates training pipelines via OpenFaaS Gateway. Queues rollout requests, fans out invocations, and collects Prometheus metrics for training state.

Environment Workers

Gym episodes and steps executed in parallel stateless function instances. Trajectories and episode summaries persisted externally in S3/MinIO.

State Stores

Replay buffers, model checkpoints, and episode summaries persisted in S3/DB, enabling stateless function design with full training resumability.

Core Functions

  • rollout-collector — parallel env steps producing trajectories
  • reward-aggregator — compute returns and GAE advantages
  • policy-evaluator — off/on-policy evaluation
  • policy-updater — gradient update and checkpoint

Operations Guidance

  • Use async queueing for bursty rollout requests
  • Externalise state — S3/MinIO for buffers, Redis/DB for metadata
  • Tune per-function memory and timeout settings to model size
  • Enable distributed traces and Prometheus metrics for bottleneck analysis

Ideal Use Cases

Massive Parallel Rollouts

Scale out environment interaction across hundreds of short-lived worker functions where data collection and evaluation dominate training compute.

Algorithm A/B Experimentation

Spin up competing RL configurations and policies as independent function graphs. Collect comparative metrics without dedicated experiment infrastructure.

MLOps Integration

Plug RL training into CI/CD pipelines. Store artifacts centrally and drive training experiments via event triggers from data pipelines or model registries.

Getting Started

  1. 1 Deploy OpenFaaS on Kubernetes and ensure CLI access via faas-cli.
  2. 2 Clone the repo and build functions: github.com/SidAg26/FaaSTrainGym
  3. 3 Configure external storage (S3/MinIO + Redis) for trajectories, checkpoints, and replay buffers.
  4. 4 Invoke rollout and evaluation functions and monitor training metrics via Gateway and Prometheus.

Publication

ICCIDA 2024 · Published

On-Demand Cold Start Frequency Reduction with Off-Policy Reinforcement Learning in Serverless Computing

Agarwal, S. et al. — International Conference on Computational Intelligence, Data Science and Applications (ICCIDA) 2024

View Paper