Inference workflows

Inference from a template. Not a script.

Reusable, governed templates for deploying AI inference environments on GPU VMs. Platform teams define the standard. Application teams self-serve. Every deployment governed and auditable.

Get a demo →

What Inference workflows provide

Governed templates that turn inference deployment into a repeatable operation.

Reusable inference templates

Single-model and multi-model workflow templates. DevOps/platform teams define VM limits, setup scripts, and parameterized inputs. Standardizes inference deployment across teams.

Self-service with governance

End users launch via UI or API. Guardrails enforced: instance size limits, parameter constraints, RBAC. Platform teams control what's deployable. No ungoverned GPU sprawl.

Automated application setup

Shell scripts install inference servers, dependencies, drivers, ports. Scripts execute automatically post-provisioning. Application teams deploy without scripting or infrastructure expertise.

Workflow builder — template with parameters and template library view

How it works

From trained model to running endpoint in four steps.

1
Platform team creates a template
Define the GPU VM configuration, setup script, instance limits, and parameterized inputs. Set guardrails: maximum instance size, allowed regions, RBAC permissions. Publish to the template library.
2
Application team selects and configures
Browse the approved template library. Choose a template. Fill in parameters — model path, runtime configuration, environment variables. No infrastructure decisions required.
3
emma provisions and deploys
GPU VM provisioned automatically. Setup script executes: inference server installed, dependencies configured, ports opened, model loaded. The endpoint is live — governed, monitored, attributable.
4
Operate, govern, iterate
Deployed instance behaves as a standard emma GPU VM — same monitoring, same governance, same cost attribution. Next deployment starts from the library, not from scratch.

The reality without governed workflows

Every inference deployment is a snowflake. And nobody can trace it.

Before emma

Every inference deployment scripted from scratch

No standardization across teams or models

Runtime configuration embedded in one-off scripts

Data scientists blocked by infrastructure setup

Auditor asks what’s running — nobody knows

After emma

Deploy from an approved template library

Same deployment pattern for every team and model

Parameters separated from infrastructure logic

Self-service within guardrails — no infrastructure ticket

Every deployment governed and auditable

Two audiences. One workflow.

Platform teams define the standard. Application teams self-serve.

Platform Engineers / DevOps

•Create and maintain workflow templates
•Set guardrails: instance limits, allowed GPU types, regions
•Define setup scripts and dependencies
•Manage VM lifecycle: Draft → Publish → Govern
•Build the institutional library that outlasts any individual

Data Scientists / ML Engineers

•Browse approved templates in the library
•Fill in parameters: model path, config, environment
•Deploy via UI or API — no infrastructure decisions
•Get a running inference endpoint in minutes
•Focus on models, not infrastructure

Compared to alternatives

The governed middle ground between scripting from scratch and vendor lock-in.

Internal platform tooling

Custom Airflow + Terraform + Helm pipelines built by the team that’s no longer here. Fragile, ungoverned, and impossible to hand off. Every new model deployment is a re-engineering project.

SageMaker / Vertex endpoints

Managed inference — but locked to one provider. Opinionated runtimes. Single-cloud only. If you run multi-cloud GPU infrastructure, you can't standardize inference across it with provider-native tools.

emma Inference workflows

Governed templates that work across any emma-connected provider. Instance limits and RBAC, built in. Deployed VMs behave as standard emma resources with GPU monitoring and cost attribution.

What happens after deployment

The endpoint is live. Now what?

An inference workflow doesn't end at deployment. The running instance is a standard emma GPU VM — fully integrated with the rest of the platform.

GPU monitoring included

Utilization, vRAM, memory — visible in emma's monitoring interface from the moment the endpoint boots. Know whether your inference GPU is working or idle.

Cost attributed automatically

The GPU VM running your inference endpoint is tagged and attributed per team and project. No manual cost allocation. Visible in the same dashboard as everything else.

Governed like everything else

RBAC, tagging, audit trail — all applied. The inference endpoint is within your governance perimeter from the moment it's provisioned. No shadow endpoint.

Explore more

Beyond inference — the rest of the stack.

GPU compute

GPU VMs and managed K8s across five providers.

Explore →

GPU monitoring

GPU observability at VM and cluster level. No agents.

Explore →

Cross-Cloud Networking

400 Gbps private backbone connecting GPU workloads.

Explore →

Frequently asked questions

Inference workflows on emma

What exactly is an Inference Workflow?

A reusable, governed template that defines how to deploy an AI inference environment on a GPU VM. It includes the VM configuration, a setup script (dependencies, inference server, ports), parameterized inputs, and guardrails (instance limits, RBAC). Application teams deploy from the template — no infrastructure scripting required.

Is this an MLOps platform?

No. Inference workflows handle the infrastructure deployment layer — getting a governed GPU VM running with the right software and configuration. They don't handle model training, experiment tracking, feature stores, or model registries. Your existing MLOps tools (MLflow, Kubeflow, Weights & Biases) continue to manage the ML lifecycle. emma handles the infrastructure underneath.

Can workflows deploy to Kubernetes clusters?

Not in the current release. Inference workflows deploy to GPU VMs only. Kubernetes-based inference deployment is on the roadmap.

Does it support autoscaling?

Not in the current release. Each workflow deploys a single GPU VM. Autoscaling, self-healing, and orchestration across multiple instances are on the roadmap.

What guardrails can platform teams set?

Instance size limits (maximum GPU type, vCPU, memory), parameter constraints (allowed values for inputs), RBAC permissions (who can deploy from which templates), and region restrictions. Application teams self-serve within these boundaries.

What happens to the GPU VM after deployment?

It behaves like any other emma GPU VM. GPU monitoring is available (utilization, vRAM, memory). Cost is attributed per team and project. Governance policies apply. The VM has standard lifecycle controls: start, stop, delete. It's inside your governance perimeter from the moment it boots.

Can I integrate workflows with my CI/CD pipeline?

Not in the current release. Workflows are triggered via the emma UI or API. Git integration and CI/CD pipeline triggers are on the roadmap.

See a governed inference deployment with auditor-readable output.

45-minute walkthrough. Templates, guardrails, and a running endpoint.

Get a demo →

Inference from a template. Not a script.

Reusable inference templates

Self-service with governance

Automated application setup

Platform team creates a template

Application team selects and configures

emma provisions and deploys

Operate, govern, iterate

Platform Engineers / DevOps

Data Scientists / ML Engineers

Internal platform tooling

SageMaker / Vertex endpoints

emma Inference workflows

GPU monitoring included

Cost attributed automatically

Governed like everything else

GPU compute

GPU monitoring

Cross-Cloud Networking

See a governed inference deployment with auditor-readable output.