On-Premise Deployment

Self-hosted open-source models, your infrastructure, your data. Full control — and full responsibility for every layer of the stack.

System Component
Security Concern
Network Boundary

Security Surface

On-premise deployments eliminate API-level data exposure but introduce infrastructure-level responsibilities. Every component you host is a component you must secure.

HIGH

Model Weight Security

Open-source model weights stored on local GPUs or servers are a high-value target. If an attacker accesses the model files, they can extract training data artefacts, fine-tuning data, or deploy a poisoned model in its place.

Mitigation: Encrypted storage, access controls on model directories, integrity checksums, air-gapped model loading.
HIGH

Prompt Injection

Open-source models may have weaker built-in guardrails than commercial APIs. Prompt injection attacks — where user input manipulates the model's instructions — can be more effective against self-hosted models without additional hardening.

Mitigation: Input sanitisation layer, system prompt isolation, output validation, dedicated guardrail models (e.g., Llama Guard).
MEDIUM

Inference Server Exposure

Self-hosted inference servers (vLLM, Ollama, TGI) expose HTTP endpoints internally. If network segmentation is poor, these become lateral movement targets — an attacker on the network can query your model directly.

Mitigation: Network segmentation, mTLS between services, auth on inference endpoints, no public exposure.
MEDIUM

Data Pipeline Leakage

RAG pipelines, embedding stores, and context databases all contain sensitive business data. On-prem means this data never leaves your network — but internal access controls are critical.

Mitigation: Role-based access to vector stores, encryption at rest, audit logging on data access.
MEDIUM

MCP Server Attack Surface

MCP servers bridge your agents to internal tools and databases. Each server is an authenticated connection point — a compromised MCP server gives an attacker the same access the agent has.

Mitigation: Principle of least privilege per MCP server, credential rotation, connection audit logging.
LOW

Supply Chain (Model Downloads)

Downloading models from Hugging Face or other registries introduces supply chain risk. Compromised model weights could contain backdoors or biased behaviour not present in the original release.

Mitigation: Verify model checksums, use pinned versions, scan weights with model security tools, maintain an internal model registry.

Why On-Premise?

Data Sovereignty

No data leaves your network. Required for regulated industries (healthcare, finance, government) and organisations with strict data residency requirements.

Cost at Scale

API costs scale linearly with usage. Self-hosted inference has high fixed costs but low marginal costs — at volume, it's significantly cheaper.

Customisation

Fine-tune models on your data, control tokenisation, adjust inference parameters, and run specialised models not available via API.

Latency Control

No round-trip to external APIs. Co-locate inference with your data for sub-100ms response times when it matters.

Considering an on-premise deployment?

I help plan and build self-hosted AI infrastructure — from GPU provisioning to production monitoring.