Inference workflows

Inference from a template. Not a script.

Reusable, governed templates for deploying AI inference environments on GPU VMs. Platform teams define the standard. Application teams self-serve. Every deployment governed and auditable.

AWS Azure GCP emma
Governed templates that turn inference deployment into a repeatable operation.

Reusable inference templates

Single-model and multi-model workflow templates. DevOps/platform teams define VM limits, setup scripts, and parameterized inputs. Standardizes inference deployment across teams.

Self-service with governance

End users launch via UI or API. Guardrails enforced: instance size limits, parameter constraints, RBAC. Platform teams control what's deployable. No ungoverned GPU sprawl.

Automated application setup

Shell scripts install inference servers, dependencies, drivers, ports. Scripts execute automatically post-provisioning. Application teams deploy without scripting or infrastructure expertise.

Workflow builder — template with parameters and template library view
From trained model to running endpoint in four steps.
  • 1

    Platform team creates a template

    Define the GPU VM configuration, setup script, instance limits, and parameterized inputs. Set guardrails: maximum instance size, allowed regions, RBAC permissions. Publish to the template library.

  • 2

    Application team selects and configures

    Browse the approved template library. Choose a template. Fill in parameters — model path, runtime configuration, environment variables. No infrastructure decisions required.

  • 3

    emma provisions and deploys

    GPU VM provisioned automatically. Setup script executes: inference server installed, dependencies configured, ports opened, model loaded. The endpoint is live — governed, monitored, attributable.

  • 4

    Operate, govern, iterate

    Deployed instance behaves as a standard emma GPU VM — same monitoring, same governance, same cost attribution. Next deployment starts from the library, not from scratch.

Every inference deployment is a snowflake. And nobody can trace it.
Before emma
Every inference deployment scripted from scratch
No standardization across teams or models
Runtime configuration embedded in one-off scripts
Data scientists blocked by infrastructure setup
Auditor asks what’s running — nobody knows
After emma
Deploy from an approved template library
Same deployment pattern for every team and model
Parameters separated from infrastructure logic
Self-service within guardrails — no infrastructure ticket
Every deployment governed and auditable
Platform teams define the standard. Application teams self-serve.

Platform Engineers / DevOps

  • Create and maintain workflow templates
  • Set guardrails: instance limits, allowed GPU types, regions
  • Define setup scripts and dependencies
  • Manage VM lifecycle: Draft → Publish → Govern
  • Build the institutional library that outlasts any individual

Data Scientists / ML Engineers

  • Browse approved templates in the library
  • Fill in parameters: model path, config, environment
  • Deploy via UI or API — no infrastructure decisions
  • Get a running inference endpoint in minutes
  • Focus on models, not infrastructure
The governed middle ground between scripting from scratch and vendor lock-in.

Internal platform tooling

Custom Airflow + Terraform + Helm pipelines built by the team that’s no longer here. Fragile, ungoverned, and impossible to hand off. Every new model deployment is a re-engineering project.

AWSGCP

SageMaker / Vertex endpoints

Managed inference — but locked to one provider. Opinionated runtimes. Single-cloud only. If you run multi-cloud GPU infrastructure, you can't standardize inference across it with provider-native tools.

emma

emma Inference workflows

Governed templates that work across any emma-connected provider. Instance limits and RBAC, built in. Deployed VMs behave as standard emma resources with GPU monitoring and cost attribution.

The endpoint is live. Now what?

An inference workflow doesn't end at deployment. The running instance is a standard emma GPU VM — fully integrated with the rest of the platform.

GPU monitoring included

Utilization, vRAM, memory — visible in emma's monitoring interface from the moment the endpoint boots. Know whether your inference GPU is working or idle.

Cost attributed automatically

The GPU VM running your inference endpoint is tagged and attributed per team and project. No manual cost allocation. Visible in the same dashboard as everything else.

Governed like everything else

RBAC, tagging, audit trail — all applied. The inference endpoint is within your governance perimeter from the moment it's provisioned. No shadow endpoint.

Inference workflows on emma
What exactly is an Inference Workflow?

A reusable, governed template that defines how to deploy an AI inference environment on a GPU VM. It includes the VM configuration, a setup script (dependencies, inference server, ports), parameterized inputs, and guardrails (instance limits, RBAC). Application teams deploy from the template — no infrastructure scripting required.

Is this an MLOps platform?

No. Inference workflows handle the infrastructure deployment layer — getting a governed GPU VM running with the right software and configuration. They don't handle model training, experiment tracking, feature stores, or model registries. Your existing MLOps tools (MLflow, Kubeflow, Weights & Biases) continue to manage the ML lifecycle. emma handles the infrastructure underneath.

MLflow Kubeflow Hugging Face

Can workflows deploy to Kubernetes clusters?

Not in the current release. Inference workflows deploy to GPU VMs only. Kubernetes-based inference deployment is on the roadmap.

Does it support autoscaling?

Not in the current release. Each workflow deploys a single GPU VM. Autoscaling, self-healing, and orchestration across multiple instances are on the roadmap.

What guardrails can platform teams set?

Instance size limits (maximum GPU type, vCPU, memory), parameter constraints (allowed values for inputs), RBAC permissions (who can deploy from which templates), and region restrictions. Application teams self-serve within these boundaries.

What happens to the GPU VM after deployment?

It behaves like any other emma GPU VM. GPU monitoring is available (utilization, vRAM, memory). Cost is attributed per team and project. Governance policies apply. The VM has standard lifecycle controls: start, stop, delete. It's inside your governance perimeter from the moment it boots.

Can I integrate workflows with my CI/CD pipeline?

Not in the current release. Workflows are triggered via the emma UI or API. Git integration and CI/CD pipeline triggers are on the roadmap.

See a governed inference deployment with auditor-readable output.

45-minute walkthrough. Templates, guardrails, and a running endpoint.

Get a demo