🏛️

Copilot Project: Baden-Württemberg Government

Open-Source AI Platform for Digital Sovereignty

100%
Open Source
1000s
Government Users
6
Architecture Layers
BSI
Compliance

Project Introduction

In response to growing concerns about dependency on foreign technology companies amid the rapid advancement of AI and digitalization, the German government, along with other European nations, has taken proactive steps to establish technological sovereignty. The state of Baden-Württemberg initiated the Copilot Project as a pioneering effort to create a reference architecture for a fully open-source, self-hosted AI platform.

The primary motivation was to reduce reliance on proprietary, foreign-owned AI services while maintaining cutting-edge AI capabilities. The project aimed to demonstrate that governments could operate sophisticated AI platforms with complete control over their data, infrastructure, and AI models, ensuring compliance with European data protection regulations (GDPR) and maintaining digital sovereignty.

The resulting platform provides a ChatGPT-like user experience through OpenWebUI, while maintaining complete control over the entire technology stack, from the user interface down to the GPU-accelerated model inference layer.

Critical Requirements

✓ Complete Open-Source Stack
✓ Self-Hosted AI Models
✓ Service-Agnostic Architecture
✓ RAG (Retrieval-Augmented Generation) Capabilities
✓ Multi-User Support with Governance
✓ Agentic Workflows
✓ Enterprise-Grade Scalability
✓ BSI Compliance

Project Complexity & Challenges

🔧

Open-Source Component Integration

Integrating multiple open-source projects with different maturity levels into a cohesive platform. OpenWebUI required extensive customization, multiple authentication systems needed seamless integration, and version compatibility management was critical.

🤖

Multi-Model Management

Supporting diverse AI models from 7B to 70B+ parameters with vastly different resource requirements. Model quantization strategies (4-bit, 8-bit, FP16) created tradeoffs between quality and performance.

🔐

Security and Compliance

Meeting stringent government security requirements, GDPR compliance, and BSI guidelines. End-to-end encryption, comprehensive audit logging, RBAC with fine-grained permissions, and data residency requirements within German/EU borders.

📚

RAG Implementation Complexity

Building reliable RAG pipelines for securely accessing government documents. Vector database optimization for large-scale collections, embedding model selection for German language, and access control integration.

📈

Infrastructure Scalability

Designing infrastructure to scale from development to production serving thousands of concurrent users. GPU resource allocation in Kubernetes, auto-scaling for CPU and GPU workloads, and cost optimization.

💰

Cost Tracking and Governance

Implementing comprehensive cost tracking, usage quotas, and billing mechanisms for multi-tenant government platform. Token-level cost calculation, department-level budget allocation, and API rate limiting.

🔑

Authentication and Federation

Integrating with existing government identity providers, supporting SAML and OAuth2/OIDC, eID system integration, SSO across all platform components, and role synchronization from AD/LDAP.

⚙️

DevOps and GitOps Complexity

Establishing robust CI/CD pipelines with stringent change management. Managing Helm charts for dozens of microservices, GitOps workflows with ArgoCD, model versioning, and compliance scanning.

Solution Architecture & Implementation

Six-Layer Architecture

The Copilot platform consists of six primary layers working together to deliver a complete AI platform built on modularity, scalability, security-by-design, and observability principles.

🖥️
Layer 1
Frontend
OpenWebUI
User-facing chat interface
🚪
Layer 2
API Gateway
Kong Gateway
Routing, load balancing, reverse proxy
🔐
Layer 3
Authentication
Keycloak
Identity and access management with SSO
🔄
Layer 4
LLM Gateway
LiteLLM Proxy
Model routing, cost tracking, quotas
🤖
Layer 5
Model Serving
vLLM
GPU-accelerated model inference
📊
Layer 6
Data & RAG
Qdrant, PostgreSQL
Vector databases, document storage

Detailed Layer Implementations

🖥️

Layer 1: Frontend - OpenWebUI

ChatGPT-like user experience with government customization

Government Customization
  • • Complete branding with government identity standards
  • • WCAG 2.1 Level AA accessibility compliance
  • • Multi-language support (German/English)
  • • Document upload with classification metadata
  • • Automatic session timeout and security indicators
Features
  • • Conversation management with cost tracking
  • • Model selection interface with real-time status
  • • RAG integration with document transparency
  • • Export conversations for record-keeping
  • • Horizontally scalable deployment

Layer 2: API Gateway - Kong Gateway

Central nervous system for all platform requests

Core Functions
  • • Intelligent request routing
  • • SSL/TLS termination
  • • Load balancing across services
  • • Dynamic route configuration
  • • Health checking
Security
  • • JWT token validation
  • • Rate limiting and quotas
  • • IP allowlisting/denylisting
  • • Request validation
  • • Circuit breaker patterns
Observability
  • • Comprehensive logging
  • • Prometheus metrics export
  • • Distributed tracing
  • • Kong Manager dashboard
  • • Real-time monitoring

Layer 3: Authentication - Keycloak

Identity and access management with SSO

Core Capabilities
  • Single Sign-On (SSO) across all platform services
  • Federation with Active Directory and LDAP
  • Integration with Germany's eID system
  • SAML 2.0 and OAuth2/OIDC support
  • Multi-factor authentication (MFA)
RBAC System
Platform Administrator: Full control
Department Administrator: Department management
AI Developer: Model deployment and management
Document Manager: RAG system document control
Standard User: Chat interface and permitted documents
Auditor: Read-only access to logs and reports
Financial Controller: Cost and billing information

Layer 4: LLM Gateway - LiteLLM Proxy

Intelligent model routing and cost management

Unified Model Interface
  • OpenAI-compatible API for all models
  • Automatic request translation
  • Multi-model support
  • Model switching and load balancing
Cost Tracking
  • Per-user, per-department tracking
  • Token-based cost calculation
  • Real-time cost accumulation
  • Budget alerts and quota enforcement
Usage Analytics
  • Requests per second monitoring
  • Average latency per model
  • Token consumption trends
  • Model popularity metrics
Advanced Features
  • Intelligent caching
  • Rate limiting and fair use
  • API key management
  • Admin dashboard

Layer 5: Model Serving - vLLM

GPU-accelerated inference with PagedAttention

Why vLLM?

Industry-leading performance with PagedAttention algorithm reducing GPU memory requirements by up to 24x. Same technology used by ChatGPT for serving transformer-based language models at scale.

PagedAttention
24x memory reduction
Continuous Batching
Dynamic request processing
Model Support
Llama, Mistral, GPT-J, BLOOM
Performance Optimizations
  • • Fused CUDA kernels for reduced overhead
  • • Flash Attention implementation
  • • Mixed-precision computation (FP16/FP32)
  • • Model quantization (GPTQ, AWQ, SmoothQuant)
  • • Tensor parallelism for multi-GPU serving
Advanced Features
  • • Streaming responses via SSE/WebSockets
  • • Speculative decoding for faster generation
  • • KV cache sharing for common prompts
  • • Multi-model serving and LoRA adapters
  • • Comprehensive GPU utilization monitoring

Layer 6: Data & RAG Infrastructure

Vector databases and knowledge-enhanced AI

Qdrant Vector Database

Stores document embeddings for semantic similarity search, enabling AI models to access and reason over government documents while maintaining security and access controls.

Document Processing
  • • Multilingual embedding models (German focus)
  • • Document chunking with overlap (512-1024 tokens)
  • • OCR for scanned documents (Tesseract)
  • • Metadata extraction and classification
Access Control
  • • Security clearance-based filtering
  • • Department-level partitioning
  • • Server-side authorization enforcement
  • • Classification level restrictions (Public/Internal/Confidential/Secret)

DevOps & Infrastructure

Kubernetes Orchestration

  • Custom GPU operators for resource allocation
  • Helm charts for all microservices
  • Auto-scaling for CPU and GPU workloads
  • High availability across multiple nodes
  • Pod Security Standards enforcement

CI/CD Pipeline

  • GitOps workflows with ArgoCD
  • Automated compliance scanning (licenses, vulnerabilities)
  • Blue-green and canary deployment strategies
  • Model versioning and MLOps pipelines
  • Automated rollback procedures

Monitoring & Observability

  • Prometheus for metrics collection
  • Grafana dashboards for visualization
  • ELK Stack for log aggregation
  • Jaeger for distributed tracing
  • GPU utilization and performance tracking

Security & Compliance

  • BSI (German Federal Cybersecurity) compliance
  • GDPR data protection compliance
  • End-to-end encryption for all communications
  • Comprehensive audit logging
  • Data residency within German/EU borders

Complete Technology Stack

Frontend

OpenWebUIReactTailwindCSSTypeScript

API Gateway

Kong GatewayKong ManagerNginxLoad Balancing

Authentication

KeycloakSAML 2.0OAuth2/OIDCeID Integration

LLM Gateway

LiteLLM ProxyCost TrackingQuota ManagementAPI Keys

Model Serving

vLLMPagedAttentionGPTQAWQ

AI Models

Llama 3MistralGPT-JBLOOM

Vector DB

QdrantPostgreSQLEmbeddingsRAG

Infrastructure

KubernetesHelmArgoCDTerraform

Monitoring

PrometheusGrafanaELK StackJaeger

DevOps

GitOpsCI/CDJenkinsGitHub Actions

Security

BSI ComplianceGDPRRBACMFA

Languages

PythonTypeScriptGoBash

Project Impact

100%
Digital Sovereignty
Complete data and infrastructure control
Zero
Vendor Lock-in
Fully open-source stack
1000s
Government Users
Serving multiple departments

Established a reference architecture for government AI platforms across Europe, demonstrating that complete digital sovereignty is achievable while maintaining cutting-edge AI capabilities. The platform eliminates dependency on foreign technology companies, ensures GDPR and BSI compliance, and provides a ChatGPT-like experience with full control over data, infrastructure, and AI models—pioneering the future of sovereign AI for public sector organizations across Germany and the European Union.