Déjà System Specification & Architecture Reference
Deterministic, team-level institutional memory for production incidents. This document defines the core physics, normalization pipeline, integration standards, security posture, validation math, lifecycle semantics, and troubleshooting, including how different plan tiers (Observer, Pro, Enterprise) change recall depth and data retention behavior.
1. Core concepts
01 // CORE_PHYSICSThe mathematical principles behind the deterministic engine and how it behaves as a team-level system across different recall horizons (90-day Observer, 10-year Pro, multi-year Enterprise). No inference. No probabilistic guesses.
The Thesis: Rejection of Inference
Most incident resolution tools rely on probabilistic models (LLMs, embeddings, similarity) to guess relationships. Déjà rejects this for production infrastructure: a "high confidence guess" is still noise if an operator must manually verify it.
Promise: Déjà offers proven matches or silence. Silence is preferable to hallucination.
Trust Properties
- Zero hallucination: cannot invent a link that does not exist.
- Zero data leakage: source code is never passed to a third-party generative model.
- Auditability: every match traces to a commit hash, path identity, or error signature.
"…matching the canonical path… comprises a deterministic match that does not require real-time inference by a machine learning model."File identity is resolved through Git ancestry and explicit renames (e.g., git log --follow). Copy/paste/delete boundaries create new identity—no guessing.
Déjà optimizes Signal-to-Noise over recall. If the system cannot mathematically validate a link, it does not surface it.
1 = proven match. 0 = no match. "Likely" is not a state.
Time travel + real-time
Lane 1 backfills default-branch history (completeness over latency) within the configured recall horizon for the team (90 days on Observer; up to 10 years by default on Pro and above). Lane 2 ingests real-time webhooks (low latency). Both feed a shared normalization engine to ensure identical fingerprints for old and new incidents.
LANE_1 (HISTORICAL[ORG_PLAN_HORIZON]) + LANE_2 (REAL_TIME) -> NORMALIZATION -> CANONICAL_INDEX -> DETERMINISTIC_MATCHERFix → Broadcast
Instead of "Crash → Fix," Déjà enables "Fix → Broadcast." Verified fixes propagate across dependency graphs to warn downstream services running vulnerable versions.
| Status | Meaning |
|---|---|
| Exposed | Service runs code containing known defect pattern or imports vulnerable dependency version. |
| Patched | Service applied explicit patch (commit or version bump confirms). |
| Immunized | Service moved Exposed → Patched without an incident event. |
2. The normalization engine
02 // NORMALIZATION_ENGINEHow raw, noisy telemetry becomes stable fingerprints that can collide across time and tool boundaries.
Zero tolerance for un-hydrated code
Minified/obfuscated stack traces are rejected at the edge. Unusable payloads are "data pollution." Déjà does not guess original source paths.
- Reject minified bundles (e.g., a.js)
- Reject hashed filenames (e.g., main.8f9a21b4.chunk.js)
- Reject vendor frames (node_modules, runtime) unless whitelisted
First actionable frame
Traverse frames F0→Fn. Skip vendor/middleware and entropy-rejected frames. Anchor on first hydrated User Code frame.
"Black hole defense": if anchor appears in generic utility paths, peek to caller and use compound anchoring.
Hash composition (SHA-256)
Fingerprint collisions define identity. Dynamic message segments are sanitized prior to hashing.
Fingerprint = SHA256(AnchorFrame + CallerContext + ExceptionInvariant)3. Integration specifications
03 // INTEGRATION_STANDARDSHealthy pipeline prerequisites for observability + version control. Built for least-privilege, audit-ready review.
Source maps first
The Entropy Gate requires fully hydrated traces. A common CI/CD failure is reporting errors before source maps are processed.
Sentry.init({
beforeSend(event) {
if (event.request?.data) event.request.data = "[Redacted]";
if (event.user) { delete event.user.email; delete event.user.ip_address; }
return event;
}
});Least privilege
Tier 1 (Default): read-only metadata/diffs. Tier 2 (Optional): write scopes for patch PRs (never pushes to protected branches).
Tokens encrypted at rest (AES-256-GCM), short-lived where supported. API-first fetching: diffs processed in memory; no full repo clones.
4. Advanced configuration (.deja.yaml)
04 // ADVANCED_CONFIGURATIONTuning controls for edge cases. Day 0 defaults prioritize safety. Later tuning reduces fragmentation.
Non-actionable paths
Prevent "black hole" incidents by skipping internal libraries to force deeper anchor selection.
normalization:
sentinels:
- "src/middleware/**"
- "src/utils/http-client.ts"Consolidate fragmentation
Merge fingerprints that should be treated as the same after refactors or identity breaks.
grouping:
- target: "src/auth/NewLogin.ts"
aliases:
- "src/legacy/OldLoginController.js"Dirty invariants
Regex rules to strip high-entropy segments that break stable grouping.
sanitizers:
- pattern: "TX-[A-Z0-9]{4}-[A-Z]+"
replacement: "<tx_id>"5. Security & compliance
05 // SECURITY_PRIVACYCISO due diligence: isolation, retention, encryption, and "no training" guarantees.
Tenant separation
- PostgreSQL Row Level Security (RLS) scoped to vault_id
- Cache/queue namespacing with tenant-prefixed keys
- Data residency: us-east-1 default; optional single-tenant EU deployments
Contract + architecture
- No neural weights: deterministic hashing, no probabilistic inference
- Ephemeral processing: diffs processed in memory then discarded
- Store metadata, not contiguous corpora for model training
Metadata only
We store: file paths, commit hashes, function signatures, line numbers, bounded diff fragments.
We do not store: full file contents, .git directories, full history, unrelated assets.
Bounded + auditable
- Knowledge graph metadata: retained for vault lifetime
- Raw webhooks: retained 7 days for replay/debug then deleted
- Diff contexts: bounded (e.g., 30 days) then purged
6. The validation orchestrator
06 // VALIDATION_ORCHESTRATORHow Déjà proves a resolution. Validation is statistical; trust is surfaced as tiers, not guesses.
Statistical validation
Compare pre/post windows normalized by traffic. Validation pauses when traffic lacks significance.
ΔE = (Rate(W_pre) - Rate(W_post)) / Rate(W_pre) * 100
Rate(W) = fingerprinted_errors / total_traffic-100 penalty
Reverts within 72 hours are the strongest signal of a failed solution signature.
Visibility tiers
| Tier | Threshold | Behavior |
|---|---|---|
| Silence | < 70% | No suggestion shown (preserves trust) |
| Possible Match | 70–90% | Displayed; requires human confirmation |
| Verified Match | > 90% | Auto-displayed as known root cause |
7. The incident lifecycle
07 // SYSTEM_LIFECYCLEFinite state machine with system-driven transitions. Designed for calm infrastructure.
States
Definitions
- Ingesting: awaiting normalization (<120ms) then dropped or normalized
- Candidate: merged PR touches anchor frame files
- Validating: rate gate active; pauses during low traffic
- Regressed: verified fingerprint reappears ("zombie bug")
- Immunized: patched before incident occurred
8. Troubleshooting
08 // TROUBLESHOOTING"Why didn't it match?" Because distributed systems require explicit failure analysis.
No match found
- Entropy rejection: minified trace → upload maps before deploy
- Sentinel interference: sentinels too broad → refine patterns
- Black hole: generic utility anchor + unstable message → add sentinel or sanitizer
Wrong fix shown
- Monolith problem: anchor on huge generic file → sentinel it, deepen linkage
- Generic error: invariant entropy too low → throw typed errors; improve messages
Threat Model
09 // THREAT_MODELControls are only meaningful when mapped to adversaries. This section defines assumed threats, mitigations, and residual risk.
| Threat | Mitigation | Residual risk |
|---|---|---|
| Malicious tenant attempting cross-vault inference | Row-level security, strict tenant-scoped keys, request context enforcement, audit logs for all access. | Misconfiguration risk; mitigated by policy tests + continuous verification. |
| Compromised webhook ingestion key | Short-lived tokens where supported; rotate keys; scope keys to vault; rate limiting; anomaly detection; replay protection. | Exposure window until rotation; mitigated by TTL + rapid revocation. |
| Replay attacks on historical payloads | Nonce/timestamp validation; idempotency keys; bounded retention for raw webhooks; signature verification. | Clock skew edge cases; mitigated by allowed window + alerts. |
| Poisoned telemetry (intentional garbage input) | Entropy Gate rejects un-hydrated/minified payloads; schema validation; quarantine on rejection spikes; degraded-mode signals. | Resource pressure; mitigated by backpressure + isolation quotas. |
| Insider access (support / ops) | Least-privilege RBAC; break-glass access with approvals; immutable audit logs; encrypted secrets; customer-visible access reports. | Human factor; mitigated via separation of duties and monitoring. |
System Failure Semantics
10 // FAILURE_SEMANTICSFormal degraded-operation policy: how the system behaves when dependencies fail — and what it refuses to do when uncertain.
Default: fail closed on trust
- Matching: fail closed (silence) if proof cannot be established.
- Ingestion: accept telemetry, but quarantine/mark degraded if payloads are unusable.
- Validation: do not "declare verified" without statistical confidence.
Explicit degraded modes
- Git provider API unavailable: pause Lane 1 backfill; keep Lane 2 ingest; mark "INDEXING_DEGRADED: VCS_UNREACHABLE".
- Webhook ingestion delayed: maintain ordering where possible; surface "INGEST_LATENCY: ELEVATED"; avoid invalid validation windows.
- Validation insufficient traffic: pause soak until significance returns; do not treat low traffic as success.
Déjà never…
Data Model Overview
11 // DATA_MODELLogical schema (conceptual) to reason about isolation, lifecycle, and determinism — not SQL.
Core objects
- Vault — tenant boundary. Owns all data + keys.
- Fingerprint — SHA-256 identity of incident signature.
- Alias — canonical path ↔ legacy path mapping (rename graph).
- ResolutionArtifact — PR/commit that claims to fix an incident.
- ValidationEvent — rate gate outcomes + revert signals.
How they connect
Vault 1—N Fingerprint
Fingerprint 0—N ResolutionArtifact (candidates)
ResolutionArtifact 0—N ValidationEvent
Fingerprint 0—N Alias (path history)
Alias N—1 CanonicalPath (by vault)All entities are vault-scoped. Cross-vault joins are prohibited by enforcement at query and key namespaces.
Performance & Scaling Characteristics
12 // PERFORMANCE_SCALINGLatency is necessary, but capacity behavior builds confidence. Bounds below are approximate and deployment-dependent.
Throughput per vault
- Target SLO: <120ms receipt → normalization
- Backpressure + rate limits per key/vault
- Quarantine mode when rejection ratio spikes
Time travel bounds
- Commit traversal is O(n commits) within configured horizon
- Diff parsing bounded by touched files + PR metadata
- Runs as batch lane: completeness over latency
Compaction strategy
- Metadata growth roughly proportional to unique fingerprints + aliases
- Prune raw webhooks after 7 days (replay window)
- Bound diff contexts (e.g., 30 days) then purge
Explicit Non-Goals
13 // NON_GOALSSets expectations and prevents mis-selling. Déjà is precise by design.
Déjà does not…
- Auto-fix production incidents
- Replace alerting/monitoring
- Infer intent from commits, tickets, or user behavior
- Learn heuristics from your data to "guess" matches
- Optimize recall at the expense of trust
Déjà is…
- A deterministic recall layer for proven resolutions
- A chain-of-custody builder across telemetry + code history
- A trust engine that surfaces verified fixes, or remains silent
- An audit-friendly memory vault for production incidents
Change Management & Versioning
14 // CHANGE_VERSIONINGHow behavior changes over time without breaking auditability.
Compatibility guarantees
- Fingerprint algorithm versions are immutable identifiers (e.g., fp_v1).
- New versions introduced only with documented migration notes.
- Verified matches reference the version used at time of validation.
Migration strategy
- Historical fingerprints are not silently re-written.
- Optional re-index/backfill can compute new fingerprints in parallel.
- UI can show fp_v1 vs fp_v2 collisions for review.
Glossary
15 // GLOSSARYDefinitions for non-authors and security reviewers.
| Term | Definition |
|---|---|
| Deterministic Match | A relationship proven by strict equality, cryptographic hash collision, or explicit Git ancestry — never by semantic similarity. |
| Canonical Path | Normalized file identity used for matching across telemetry and version control history. |
| Anchor Frame | First actionable hydrated user-code frame selected from a stack trace after filtering vendor/entropy noise. |
| Invariant | Sanitized error signature template with dynamic segments removed to stabilize grouping. |
| Verified Fix | A candidate solution that passed validation gates (rate gate) and has not been reverted. |
| Silence | Intentional non-output state when the system cannot mathematically prove a link or cannot validate confidence. |
| Immunized | Downstream service applied a broadcasted verified fix before an incident occurred. |
Appendices
16 // APPENDICESWorld-class tier enhancements: realistic walkthroughs, worked examples, and control mappings.
End-to-end incident walkthrough
Recommended: include a realistic incident timeline from ingest → match → validation → verified recall on regression.
Validation math worked example
Show Rate Gate windows with traffic normalization and the "2 AM traffic drop" pause/resume behavior.
Security questionnaire mapping
Map controls to SOC2, ISO 27001, and GDPR requirements. Provide evidence pointers (logs, configs, policies).