Table of Contents
- The Challenge
- Architecture & Solution
- Tech Stack
- Key Engineering Decisions
- Backend: Core API Server
- Database Schema & Security
- Docker Engine Module
- SSH & Terminal Engine
- Go Metrics Collector
- Authentication & RBAC
- GitOps Deployment Engine
- Frontend: Vite Dashboard
- Infrastructure Layout
- Deployment
- Roadmap
The Challenge
Managing multi-node VPS infrastructure manually presents significant challenges:
- Fragmented tooling: Each server requires separate SSH sessions, Docker commands, and monitoring tools
- No unified visibility: Container logs, system metrics, and deployment status live in different places
- Security complexity: Managing SSH keys, firewall rules, and access controls across multiple nodes is error-prone
- Manual deployments: No standardized deployment pipeline for GitOps workflows
- No audit trail: Actions taken on infrastructure have no accountability or replay capability
- Limited real-time insight: Polling-based monitoring misses transient issues and state changes
The requirement: Build a unified infrastructure management platform that provides real-time visibility, Docker orchestration, SSH terminal access, GitOps deployment, and comprehensive audit logging across a multi-node VPS cluster — all self-hosted with zero cloud dependencies.
Architecture & Solution
Server Box v2 implements the "UI as Lens" principle where all business logic lives exclusively in the Core API backend. The frontend is a read-display-emit interface with zero authoritative state.
Three Laws of the UI Layer
- Read state from the Core API — never compute locally
- Emit intent via API calls — never execute logic locally
- Let Core decide all outcomes — never hold authoritative state
Component Responsibilities
| Component | Responsibility |
|---|---|
| Core API | All business logic, state, auth, orchestration |
| Dashboard | Read/display state, emit user intent |
| Docker Engine | Container lifecycle, image management, stack orchestration |
| SSH Terminal | Connection pooling, tmux sessions, file transfer |
| Metrics Collector | Go-based agent for real-time telemetry |
| GitOps Engine | Deployment pipelines, rollback, preview environments |
| PostgreSQL | Persistent state, audit trail |
| Redis + BullMQ | Caching, job queues, rate limiting |
Tech Stack
| Layer | Technology | Role |
|---|---|---|
| Backend Framework | Hono 4.x | Fast, lightweight web framework |
| API Layer | tRPC 11 | Type-safe API with zero serialization |
| Language | TypeScript | Full-stack type safety |
| Database | PostgreSQL 16 | Production data store |
| ORM | Drizzle ORM | Type-safe SQL, zero runtime overhead |
| Queue | BullMQ + Redis | Job scheduling, rate limiting |
| Metrics | Go | High-performance telemetry collection |
| Frontend | React 19 + Vite | Modern SPA with fast HMR |
| State Management | TanStack Query + Router | Server state + routing |
| Real-time | SSE + WebSocket | Live updates to dashboard |
| Styling | CSS Modules + Framer Motion | Deep Obsidian design system |
| Infrastructure | Ansible | Server hardening, provisioning |
| Terminal | xterm.js | Browser-based terminal emulation |
| Container | Docker + Kubernetes | Orchestration targets |
Key Engineering Decisions
1. Hono over Express — 5x Performance
// core/src/index.ts
import { Hono } from 'hono';
import { cors } from 'hono/cors';
import { trpcServer } from '@trpc/server/adapters/hono';
import { appRouter } from './routers/index';
import { authMiddleware } from './middleware/auth';
import { logger } from './middleware/logger';
const app = new Hono();
// Global middleware stack
app.use('*', logger());
app.use('*', cors({ origin: '*', credentials: true }));
app.use('*', authMiddleware);
// tRPC route
app.route('/trpc', trpcServer({ router: appRouter }));
// Health check
app.get('/health', (c) => c.json({ status: 'ok', timestamp: Date.now() }));
export default app;2. tRPC for Type-Safe API
// core/src/routers/docker.ts
import { z } from 'zod';
import { router, publicProcedure, protectedProcedure } from '../trpc';
export const dockerRouter = router({
// List containers across all nodes
listContainers: protectedProcedure
.input(z.object({ nodeId: z.string().optional() }))
.query(async ({ ctx, input }) => {
const nodes = await ctx.nodeService.getAccessibleNodes(ctx.user.id);
return ctx.dockerEngine.listContainers(nodes, input.nodeId);
}),
// Start/Stop container
controlContainer: protectedProcedure
.input(z.object({
containerId: z.string(),
action: z.enum(['start', 'stop', 'restart', 'remove']),
}))
.mutation(async ({ ctx, input }) => {
return ctx.dockerEngine.control(input.containerId, input.action);
}),
// Deploy from compose
deployStack: protectedProcedure
.input(z.object({
name: z.string(),
compose: z.string(),
nodeId: z.string(),
}))
.mutation(async ({ ctx, input }) => {
return ctx.dockerEngine.deployStack(input);
}),
});3. SSH Connection Pooling
// core/src/services/ssh-pool.ts
import { Client } from 'ssh2';
interface PooledConnection {
client: Client;
lastUsed: number;
nodeId: string;
}
class SSHConnectionPool {
private pool = new Map<string, PooledConnection>();
private maxConnections = 10;
private ttl = 5 * 60 * 1000; // 5 minutes
async getConnection(node: Node): Promise<Client> {
const existing = this.pool.get(node.id);
if (existing && Date.now() - existing.lastUsed < this.ttl) {
existing.lastUsed = Date.now();
return existing.client;
}
// Create new connection
const client = new Client();
await new Promise((resolve, reject) => {
client.connect({
host: node.host,
port: node.port || 22,
username: node.sshUser,
privateKey: node.privateKey,
});
resolve(true);
});
if (this.pool.size >= this.maxConnections) {
// Evict oldest connection
const oldest = Array.from(this.pool.values())
.sort((a, b) => a.lastUsed - b.lastUsed)[0];
oldest.client.end();
this.pool.delete(oldest.nodeId);
}
this.pool.set(node.id, { client, lastUsed: Date.now(), nodeId: node.id });
return client;
}
async executeCommand(nodeId: string, command: string): Promise<string> {
const client = await this.getConnection(nodeId);
return new Promise((resolve, reject) => {
client.exec(command, (err, stream) => {
if (err) return reject(err);
let output = '';
stream.on('data', (data: Buffer) => output += data.toString());
stream.on('close', () => resolve(output));
});
});
}
}4. Immutable Audit Trail
// core/src/services/audit.ts
import { drizzle } from 'drizzle-orm/postgres-js';
import { auditLogs } from '../schema/audit';
const db = drizzle(pool);
export function logAudit(params: {
actorId: string;
actorIp: string;
action: string;
target: string;
targetId?: string;
outcome: 'success' | 'failure';
details?: object;
}) {
const log = {
id: crypto.randomUUID(),
...params,
timestamp: new Date(),
// Hash of entire record for tamper detection
checksum: crypto.createHash('sha256')
.update(JSON.stringify(params))
.digest('hex'),
};
return db.insert(auditLogs).values(log);
}
// Middleware automatically logs all mutations
export const auditMiddleware = async (ctx: Context, next: Next) => {
const start = Date.now();
const result = await next();
if (ctx.method !== 'GET') {
await logAudit({
actorId: ctx.user?.id || 'anonymous',
actorIp: ctx.ip,
action: ctx.method + ' ' + ctx.path,
target: ctx.path,
outcome: result.status < 400 ? 'success' : 'failure',
});
}
return result;
};5. SSE Event Bus for Real-Time Updates
// core/src/services/sse-bus.ts
type EventChannel = 'metrics' | 'containers' | 'deployments' | 'alerts';
class SSEEventBus {
private channels = new Map<EventChannel, Set<SSEventSink>>();
private emitter = new EventEmitter();
subscribe(channel: EventChannel, sink: SSEventSink) {
if (!this.channels.has(channel)) {
this.channels.set(channel, new Set());
}
this.channels.get(channel)!.add(sink);
}
publish(channel: EventChannel, data: object) {
const sinks = this.channels.get(channel);
if (sinks) {
sinks.forEach(sink => {
sink.send(`data: ${JSON.stringify(data)}\n\n`);
});
}
}
// Broadcast metrics from Go collector
broadcastMetrics(nodeId: string, metrics: SystemMetrics) {
this.publish('metrics', { nodeId, ...metrics });
}
// Stream container state changes
broadcastContainerEvent(event: ContainerEvent) {
this.publish('containers', event);
}
}
export const sseBus = new SSEEventBus();Backend: Core API Server
Module Architecture
| Module | Responsibility |
|---|---|
| Auth | JWT tokens, session management, 2FA |
| RBAC | Role-based permissions, EGA (Ephemeral Guest Access) |
| Node Manager | VPS registration, connection health |
| Docker Engine | Container, image, volume, network, stack management |
| Kubernetes | Cluster management, Helm integration |
| SSH Terminal | Connection pooling, tmux sessions, SFTP |
| Metrics | Real-time telemetry from Go collector |
| GitOps | Deployment pipelines, rollback, previews |
| Scheduler | BullMQ job queues for async tasks |
| Backup | Google Drive integration via rclone |
| Notifications | Email, webhook, Discord alerts |
| Audit | Immutable action logging |
tRPC Router Map
// core/src/routers/index.ts
export const appRouter = router({
// Auth
auth: authRouter,
// Node management
nodes: nodeRouter,
// Docker
docker: dockerRouter,
// Kubernetes
k8s: k8sRouter,
// SSH/Terminal
terminal: terminalRouter,
// Metrics
metrics: metricsRouter,
// Deployments
deploy: deployRouter,
// Backups
backup: backupRouter,
// Users & RBAC
users: userRouter,
// Audit
audit: auditRouter,
// Health
health: healthRouter,
});Database Schema & Security
Core Tables
| Table | Purpose |
|---|---|
| users | Admin and operator accounts |
| roles | RBAC role definitions |
| permissions | Granular permission matrix |
| nodes | Registered VPS nodes |
| containers | Docker container state |
| images | Docker images |
| stacks | Docker Compose stacks |
| deployments | GitOps deployment history |
| backups | Backup job records |
| audit_logs | Immutable action trail |
| sessions | Active user sessions |
| notifications | Alert queue |
| metrics | Time-series telemetry |
Sensitive Data Encryption
// core/src/lib/encryption.ts
import crypto from 'crypto';
const ALGORITHM = 'aes-256-gcm';
const IV_LENGTH = 16;
const TAG_LENGTH = 16;
export function encryptField(plaintext: string, key: string): string {
const iv = crypto.randomBytes(IV_LENGTH);
const cipher = crypto.createCipheriv(ALGORITHM, Buffer.from(key, 'hex'), iv);
let encrypted = cipher.update(plaintext, 'utf8', 'hex');
encrypted += cipher.final('hex');
const tag = cipher.getAuthTag();
return iv.toString('hex') + ':' + tag.toString('hex') + ':' + encrypted;
}
// SSH keys, API tokens stored encrypted at restDocker Engine Module
Container Management
// core/src/services/docker-engine.ts
class DockerEngine {
async listContainers(nodeId: string, all = true): Promise<Container[]> {
const client = await this.getDockerClient(nodeId);
const containers = await client.listContainers({ all });
return containers.map(c => ({
id: c.Id,
name: c.Names[0]?.replace('/', ''),
image: c.Image,
status: c.State,
created: c.Created,
ports: c.Ports,
}));
}
async startContainer(nodeId: string, containerId: string): Promise<void> {
const client = await this.getDockerClient(nodeId);
const container = client.getContainer(containerId);
await container.start();
await this.logAction(nodeId, 'container_start', containerId);
}
async stopContainer(nodeId: string, containerId: string): Promise<void> {
const client = await this.getDockerClient(nodeId);
const container = client.getContainer(containerId);
await container.stop();
await this.logAction(nodeId, 'container_stop', containerId);
}
async execContainer(nodeId: string, containerId: string, cmd: string[]): Promise<ExecResult> {
const client = await this.getDockerClient(nodeId);
const container = client.getContainer(containerId);
const exec = await container.exec({
Cmd: cmd,
AttachStdout: true,
AttachStderr: true,
});
const stream = await exec.start({ hijack: true });
// Stream output handling...
return { output: stream.toString(), exitCode: 0 };
}
}Stack Deployment (Docker Compose)
// core/src/services/stack-manager.ts
class StackManager {
async deployStack(nodeId: string, name: string, compose: string): Promise<Deployment> {
const parsed = YAML.parse(compose);
// Validate and prepare
await this.validateCompose(parsed);
// Write compose file to node
await this.scpToNode(nodeId, `/opt/stacks/${name}/docker-compose.yml`, compose);
// Pull images
await this.runCommand(nodeId, `cd /opt/stacks/${name} && docker compose pull`);
// Deploy
await this.runCommand(nodeId, `cd /opt/stacks/${name} && docker compose up -d`);
// Record deployment
return this.recordDeployment({ nodeId, name, compose, status: 'deployed' });
}
async scaleService(nodeId: string, stack: string, service: string, replicas: number): Promise<void> {
await this.runCommand(
nodeId,
`cd /opt/stacks/${stack} && docker compose up -d --scale ${service}=${replicas}`
);
}
}SSH & Terminal Engine
Connection Pool
The SSH module maintains a pool of connections to reduce latency:
// core/src/services/ssh-pool.ts (detailed)
class SSHConnectionPool {
private pool = new Map<string, SSHConnection>();
private maxPerNode = 3;
async getConnection(node: Node): Promise<SSHConnection> {
const key = node.id;
if (this.pool.has(key)) {
const conn = this.pool.get(key)!;
if (conn.isAlive()) {
return conn;
}
}
// Create new connection
const conn = await this.createConnection(node);
// Manage pool size
if (this.pool.size >= this.config.maxConnections) {
await this.evictOldest();
}
this.pool.set(key, conn);
return conn;
}
async createConnection(node: Node): Promise<SSHConnection> {
const client = new Client();
await client.connect({
host: node.host,
port: node.port || 22,
username: node.sshUser,
privateKey: node.privateKey,
readyTimeout: 30000,
});
return { client, nodeId: node.id, createdAt: Date.now() };
}
}Immortal Terminal Sessions (Tmux-as-a-Service)
// core/src/services/tmux-service.ts
class TmuxService {
async createSession(nodeId: string, sessionName: string): Promise<void> {
const pool = this.sshPool.get(nodeId);
await pool.execute(`tmux new-session -d -s ${sessionName}`);
}
async sendInput(nodeId: string, sessionName: string, input: string): Promise<void> {
const pool = this.sshPool.get(nodeId);
await pool.execute(`tmux send-keys -t ${sessionName} "${input}" Enter`);
}
async capturePane(nodeId: string, sessionName: string): Promise<string> {
const pool = this.sshPool.get(nodeId);
return pool.execute(`tmux capture-pane -t ${sessionName} -p`);
}
async listSessions(nodeId: string): Promise<TmuxSession[]> {
const pool = this.sshPool.get(nodeId);
const output = await pool.execute('tmux list-sessions -F "#{session_name}"');
return output.split('\n').filter(Boolean).map(name => ({ name }));
}
}Go Metrics Collector
Real-Time Telemetry
The Go-based collector provides high-performance metrics:
// collector/internal/collector/metrics.go
func (c *Collector) collectSystemMetrics() SystemMetrics {
return SystemMetrics{
CPU: getCPUUsage(),
Memory: getMemoryUsage(),
Disk: getDiskUsage(),
Network: getNetworkStats(),
LoadAverage: getLoadAverage(),
ProcessCount: getProcessCount(),
Timestamp: time.Now().Unix(),
}
}
func getCPUUsage() CPUStats {
var stat syscall.Statfs_t
syscall.Statfs("/", &stat)
// Additional CPU calculation logic
return CPUStats{
Usage: calculateCPUPct(),
Cores: runtime.NumCPU(),
}
}
func (c *Collector) streamMetrics() {
ticker := time.NewTicker(5 * time.Second)
for range ticker.C {
metrics := c.collectSystemMetrics()
// Stream via WebSocket to Core
c.wsClient.Send(MetricMessage{
NodeID: c.config.NodeID,
Type: "system",
Data: metrics,
})
}
}Container Metrics
func (c *Collector) collectContainerMetrics() []ContainerMetrics {
containers, _ := c.docker.ListContainers()
var results []ContainerMetrics
for _, container := range containers {
stats, _ := c.docker.GetContainerStats(container.ID)
results = append(results, ContainerMetrics{
ID: container.ID,
Name: container.Names[0],
CPU: stats.CPUStats,
Memory: stats.MemoryStats,
Network: stats.Networks,
BlockIO: stats.BlkioStats,
})
}
return results
}Authentication & RBAC
Auth Flow
User Login Flow:
1. POST /auth/login with credentials
2. Verify against database (bcrypt)
3. Generate JWT (access + refresh)
4. Store session in Redis
5. Return tokens + set cookies
6. SSE subscribes to user notificationsRBAC Model
| Role | Permissions |
|---|---|
| Admin | Full system access, user management, destructive actions |
| Operator | Node access, container management, deployments |
| Viewer | Read-only access to metrics and logs |
| EGA (Guest) | Time-limited, limited scope access for auditors |
Ephemeral Guest Access (EGA)
// core/src/services/ega-service.ts
class EGAService {
async createGuestSession(duration: number, scopes: string[]): Promise<EGASession> {
const session = {
id: crypto.randomUUID(),
token: this.generateSecureToken(),
expiresAt: Date.now() + duration * 1000,
scopes: scopes, // e.g., ['view:metrics', 'view:containers']
createdBy: 'admin',
createdAt: Date.now(),
};
await db.insert(egaSessions).values(session);
return session;
}
async validateGuest(token: string): Promise<boolean> {
const session = await db.query.egaSessions.findFirst({
where: eq(egaSessions.token, token),
});
if (!session || session.expiresAt < Date.now()) {
return false;
}
return true;
}
}GitOps Deployment Engine
Deployment Flow
Rollback Support
// core/src/services/deploy-service.ts
class DeployService {
async rollback(deploymentId: string): Promise<void> {
const deployment = await db.query.deployments.findFirst({
where: eq(deployments.id, deploymentId),
});
// Get previous deployment
const previous = await db.query.deployments.findFirst({
where: and(
eq(deployments.appId, deployment.appId),
eq(deployments.id, deploymentId),
),
orderBy: (deployments, { desc }) => [desc(deployments.createdAt)],
limit: 1,
});
if (!previous) throw new Error('No previous deployment');
// Redeploy previous version
await this.deployToTarget(previous.manifest, deployment.nodeId);
// Record rollback
await this.recordDeployment({
...previous,
parentDeploymentId: deploymentId,
type: 'rollback',
});
}
}Frontend: Vite Dashboard
Design System: Deep Obsidian
A sleek, dark theme optimized for long monitoring sessions:
/* dashboard/src/styles/variables.css */
:root {
--color-bg-primary: #0a0a0f;
--color-bg-secondary: #12121a;
--color-bg-tertiary: #1a1a24;
--color-surface: #1e1e2a;
--color-border: #2a2a3a;
--color-accent-primary: #6366f1;
--color-accent-secondary: #818cf8;
--color-success: #22c55e;
--color-warning: #f59e0b;
--color-error: #ef4444;
--color-text-primary: #f8fafc;
--color-text-secondary: #94a3b8;
--color-text-muted: #64748b;
}Dashboard Pages
| Page | Features |
|---|---|
| Overview | System health, quick stats, recent alerts |
| Nodes | VPS fleet status, connection health |
| Containers | All containers, start/stop/restart |
| Stacks | Docker Compose management |
| Terminal | xterm.js in-browser SSH |
| Metrics | Real-time CPU, Memory, Network charts |
| Deployments | GitOps pipelines, rollback |
| Backups | Backup status, restore |
| Audit | Action history, compliance |
| Settings | User management, notifications |
Real-Time Data Flow
// dashboard/src/hooks/useMetrics.ts
import { createTRPCReact } from '@trpc/react-query';
import { trpc } from '../utils/trpc';
export function useNodeMetrics(nodeId: string) {
const utils = trpc.useUtils();
// Subscribe to real-time updates
trpc.metrics.onUpdate.useSubscription(
{ nodeId },
{
onData(metrics) {
utils.metrics.getNode.setData({ nodeId }, metrics);
},
}
);
return trpc.metrics.getNode.useQuery({ nodeId });
}Infrastructure Layout
Directory Structure
server-box/
├── core/ # Hono + tRPC API server
│ ├── src/
│ │ ├── routers/ # tRPC procedure definitions
│ │ ├── services/ # Business logic
│ │ ├── middleware/ # Auth, audit, logging
│ │ ├── schema/ # Drizzle schema definitions
│ │ └── lib/ # Utilities
│ └── package.json
├── dashboard/ # Vite + React frontend
│ ├── src/
│ │ ├── components/ # UI components
│ │ ├── pages/ # Route pages
│ │ ├── hooks/ # Custom hooks
│ │ └── styles/ # Deep Obsidian CSS
│ └── package.json
├── collector/ # Go metrics collector
│ ├── internal/ # Collection logic
│ └── pkg/ # Reusable modules
├── ansible/ # Infrastructure as Code
│ └── hardening.yml # Server security
├── scripts/ # Deployment utilities
├── docs/ # Documentation
└── docker-compose.yml # Local dev stackDocker Compose Stack
services:
postgres:
image: postgres:16-alpine
volumes:
- postgres_data:/var/lib/postgresql/data
redis:
image: redis:7-alpine
command: redis-server --appendonly yes
core:
build: ./core
ports:
- "3000:3000"
dashboard:
build: ./dashboard
ports:
- "5173:5173"Deployment
Quick Start
# Clone and setup
git clone https://github.com/bhargab-pratim-sarma/server-box.git
cd server-box
# Configure environment
cp .env.example .env
cp core/.env.example core/.env
cp dashboard/.env.example dashboard/.env
# Start services
docker compose up -d
# Or local development
./start.shVPS Node Configuration
# Register new VPS node
curl -X POST https://your-server-box/api/nodes \
-H "Authorization: Bearer <admin-token>" \
-d '{
"name": "node-1",
"host": "203.0.113.10",
"sshUser": "root",
"sshKeyPath": "/root/.ssh/id_rsa"
}'Security Hardening
# Run Ansible hardening playbook
cd ansible
ansible-playbook -i inventory hardening.yml --limit productionRoadmap
Phase 1 — Core Foundation (Complete)
- Hono + tRPC API server
- PostgreSQL with Drizzle ORM
- Basic auth with JWT
Phase 2 — Docker Integration (Complete)
- Container lifecycle management
- Image management
- Stack orchestration
Phase 3 — SSH Terminal (Complete)
- Connection pooling
- Tmux session management
- In-browser terminal with xterm.js
Phase 4 — Metrics Collection (Complete)
- Go-based collector
- Real-time SSE streaming
- Dashboard charts
Phase 5 — GitOps Engine (Complete)
- Deployment pipelines
- Rollback support
- Preview environments
Phase 6 — RBAC & Audit (Complete)
- Role-based permissions
- Immutable audit trail
- EGA (Ephemeral Guest Access)
Phase 7 — Advanced Features (Complete)
- Backup to Google Drive
- Notification system
- Kubernetes module
Phase 8 — Mobile & SDK (In Progress)
- Mobile companion app
- Public SDK
Phase 9 — AI Integration (Planned)
- Predictive scaling
- Anomaly detection
- Log analysis
Phase 10 — Multi-Region (Future)
- Distributed deployment
- Global load balancing
Engineering Proof
Real-world validation, system demonstrations, and interface captures of the execution states.
System Demonstration
Video walkthrough detailing core logic, interactions, and system behaviors in action.
System Captures
Architecture Feedback
Spotted a potential optimization or antipattern? Let me know.
