Version 1.0 • Public • Status: Active • Classification: Technical Documentation

Déjà System Specification & Architecture Reference

Deterministic, team-level institutional memory for production incidents. This document defines the core physics, normalization pipeline, integration standards, security posture, validation math, lifecycle semantics, and troubleshooting, including how different plan tiers (Observer, Pro, Enterprise) change recall depth and data retention behavior.

Structure → Core Physics → Normalization → Integration Standards → Advanced Configuration → Security & Privacy → Validation Orchestrator → System Lifecycle → Troubleshooting → Data Model → Performance & Scaling → Plan Tiers & Retention → AppendicesStructure → Core Physics → Normalization → Integration → Config → Security → Validation → Lifecycle → Data → Plans → Appendices

1. Core concepts

01 // CORE_PHYSICS

The mathematical principles behind the deterministic engine and how it behaves as a team-level system across different recall horizons (90-day Observer, 10-year Pro, multi-year Enterprise). No inference. No probabilistic guesses.

1.1 Determinism vs. probability • Claim 4 • No AI inference

The Thesis: Rejection of Inference

Most incident resolution tools rely on probabilistic models (LLMs, embeddings, similarity) to guess relationships. Déjà rejects this for production infrastructure: a "high confidence guess" is still noise if an operator must manually verify it.

> MATCH: 1 (proven) | > NO_MATCH: 0 (silent)

Promise: Déjà offers proven matches or silence. Silence is preferable to hallucination.

Claim 4: Deterministic matching without real-time ML inference

Trust Properties

Zero hallucination: cannot invent a link that does not exist.
Zero data leakage: source code is never passed to a third-party generative model.
Auditability: every match traces to a commit hash, path identity, or error signature.

"…matching the canonical path… comprises a deterministic match that does not require real-time inference by a machine learning model."

Git Identity vs. Vector Similarity

File identity is resolved through Git ancestry and explicit renames (e.g., git log --follow). Copy/paste/delete boundaries create new identity—no guessing.

Silence > Alert Fatigue

Déjà optimizes Signal-to-Noise over recall. If the system cannot mathematically validate a link, it does not surface it.

Determinism is Binary

1 = proven match. 0 = no match. "Likely" is not a state.

1.2 Dual-lane ingestion architecture

Time travel + real-time

Lane 1 backfills default-branch history (completeness over latency) within the configured recall horizon for the team (90 days on Observer; up to 10 years by default on Pro and above). Lane 2 ingests real-time webhooks (low latency). Both feed a shared normalization engine to ensure identical fingerprints for old and new incidents.

LANE_1 (HISTORICAL[ORG_PLAN_HORIZON]) + LANE_2 (REAL_TIME) -> NORMALIZATION -> CANONICAL_INDEX -> DETERMINISTIC_MATCHER

1.3 Predictive immunity (network effects) • Claim 15

Fix → Broadcast

Instead of "Crash → Fix," Déjà enables "Fix → Broadcast." Verified fixes propagate across dependency graphs to warn downstream services running vulnerable versions.

Status	Meaning
Exposed	Service runs code containing known defect pattern or imports vulnerable dependency version.
Patched	Service applied explicit patch (commit or version bump confirms).
Immunized	Service moved Exposed → Patched without an incident event.

2. The normalization engine

02 // NORMALIZATION_ENGINE

How raw, noisy telemetry becomes stable fingerprints that can collide across time and tool boundaries.

2.1 Entropy Gate

Zero tolerance for un-hydrated code

Minified/obfuscated stack traces are rejected at the edge. Unusable payloads are "data pollution." Déjà does not guess original source paths.

> REQUIRED: hydrated_frames // source_maps: present // in_app: true

Reject minified bundles (e.g., a.js)
Reject hashed filenames (e.g., main.8f9a21b4.chunk.js)
Reject vendor frames (node_modules, runtime) unless whitelisted

2.2 Anchor Frame Logic

First actionable frame

Traverse frames F0→Fn. Skip vendor/middleware and entropy-rejected frames. Anchor on first hydrated User Code frame.

> ANCHOR_SELECT: first(in_app && hydrated && !vendor)

"Black hole defense": if anchor appears in generic utility paths, peek to caller and use compound anchoring.

2.3 Compound fingerprinting

Hash composition (SHA-256)

Fingerprint collisions define identity. Dynamic message segments are sanitized prior to hashing.

Fingerprint = SHA256(AnchorFrame + CallerContext + ExceptionInvariant)

3. Integration specifications

03 // INTEGRATION_STANDARDS

Healthy pipeline prerequisites for observability + version control. Built for least-privilege, audit-ready review.

3.1 Sentry / Datadog configuration

Source maps first

The Entropy Gate requires fully hydrated traces. A common CI/CD failure is reporting errors before source maps are processed.

> GOLDEN_RULE: build → upload_maps (wait success) → deploy

Sentry.init({
  beforeSend(event) {
    if (event.request?.data) event.request.data = "[Redacted]";
    if (event.user) { delete event.user.email; delete event.user.ip_address; }
    return event;
  }
});

3.2 Version control permissions

Least privilege

Tier 1 (Default): read-only metadata/diffs. Tier 2 (Optional): write scopes for patch PRs (never pushes to protected branches).

> TIER_1: contents/read, pull_requests/read, metadata/read

> TIER_2: contents/write, pull_requests/write (feature branches only)

Tokens encrypted at rest (AES-256-GCM), short-lived where supported. API-first fetching: diffs processed in memory; no full repo clones.

4. Advanced configuration (.deja.yaml)

04 // ADVANCED_CONFIGURATION

Tuning controls for edge cases. Day 0 defaults prioritize safety. Later tuning reduces fragmentation.

4.1 Sentinel frames

Non-actionable paths

Prevent "black hole" incidents by skipping internal libraries to force deeper anchor selection.

normalization:
  sentinels:
    - "src/middleware/**"
    - "src/utils/http-client.ts"

4.2 Grouping rules

Consolidate fragmentation

Merge fingerprints that should be treated as the same after refactors or identity breaks.

grouping:
  - target: "src/auth/NewLogin.ts"
    aliases:
      - "src/legacy/OldLoginController.js"

4.3 Custom sanitizers

Dirty invariants

Regex rules to strip high-entropy segments that break stable grouping.

sanitizers:
  - pattern: "TX-[A-Z0-9]{4}-[A-Z]+"
    replacement: "<tx_id>"

5. Security & compliance

05 // SECURITY_PRIVACY

CISO due diligence: isolation, retention, encryption, and "no training" guarantees.

5.1 Data isolation

Tenant separation

PostgreSQL Row Level Security (RLS) scoped to vault_id
Cache/queue namespacing with tenant-prefixed keys
Data residency: us-east-1 default; optional single-tenant EU deployments

5.2 No training guarantee

Contract + architecture

No neural weights: deterministic hashing, no probabilistic inference
Ephemeral processing: diffs processed in memory then discarded
Store metadata, not contiguous corpora for model training

5.3 Source code privacy

Metadata only

We store: file paths, commit hashes, function signatures, line numbers, bounded diff fragments.

We do not store: full file contents, .git directories, full history, unrelated assets.

> ENCRYPTION: AES-256 (at rest) // TLS 1.2+ (in transit) // HSTS: enforced

Data retention

Bounded + auditable

Knowledge graph metadata: retained for vault lifetime
Raw webhooks: retained 7 days for replay/debug then deleted
Diff contexts: bounded (e.g., 30 days) then purged

6. The validation orchestrator

06 // VALIDATION_ORCHESTRATOR

How Déjà proves a resolution. Validation is statistical; trust is surfaced as tiers, not guesses.

6.1 Rate gate (Claim 1)

Statistical validation

Compare pre/post windows normalized by traffic. Validation pauses when traffic lacks significance.

ΔE = (Rate(W_pre) - Rate(W_post)) / Rate(W_pre) * 100
Rate(W) = fingerprinted_errors / total_traffic

6.2 Revert detection (Claim 10)

-100 penalty

Reverts within 72 hours are the strongest signal of a failed solution signature.

> PENALTY: Confidence -= 100 // Outcome: mark Failed Attempt

6.3 Confidence scoring

Visibility tiers

Tier	Threshold	Behavior
Silence	< 70%	No suggestion shown (preserves trust)
Possible Match	70–90%	Displayed; requires human confirmation
Verified Match	> 90%	Auto-displayed as known root cause

7. The incident lifecycle

07 // SYSTEM_LIFECYCLE

Finite state machine with system-driven transitions. Designed for calm infrastructure.

Incident State Machine

States

INGESTING → OPEN → CANDIDATE → VALIDATING → VERIFIED → REGRESSED → IMMUNIZED

System-driven transitions

Definitions

Ingesting: awaiting normalization (<120ms) then dropped or normalized
Candidate: merged PR touches anchor frame files
Validating: rate gate active; pauses during low traffic
Regressed: verified fingerprint reappears ("zombie bug")
Immunized: patched before incident occurred

8. Troubleshooting

08 // TROUBLESHOOTING

"Why didn't it match?" Because distributed systems require explicit failure analysis.

8.1 Debugging misses (false negatives)

No match found

Entropy rejection: minified trace → upload maps before deploy
Sentinel interference: sentinels too broad → refine patterns
Black hole: generic utility anchor + unstable message → add sentinel or sanitizer

8.2 Debugging bad matches (false positives)

Wrong fix shown

Monolith problem: anchor on huge generic file → sentinel it, deepen linkage
Generic error: invariant entropy too low → throw typed errors; improve messages

Threat Model

09 // THREAT_MODEL

Controls are only meaningful when mapped to adversaries. This section defines assumed threats, mitigations, and residual risk.

Threat	Mitigation	Residual risk
Malicious tenant attempting cross-vault inference	Row-level security, strict tenant-scoped keys, request context enforcement, audit logs for all access.	Misconfiguration risk; mitigated by policy tests + continuous verification.
Compromised webhook ingestion key	Short-lived tokens where supported; rotate keys; scope keys to vault; rate limiting; anomaly detection; replay protection.	Exposure window until rotation; mitigated by TTL + rapid revocation.
Replay attacks on historical payloads	Nonce/timestamp validation; idempotency keys; bounded retention for raw webhooks; signature verification.	Clock skew edge cases; mitigated by allowed window + alerts.
Poisoned telemetry (intentional garbage input)	Entropy Gate rejects un-hydrated/minified payloads; schema validation; quarantine on rejection spikes; degraded-mode signals.	Resource pressure; mitigated by backpressure + isolation quotas.
Insider access (support / ops)	Least-privilege RBAC; break-glass access with approvals; immutable audit logs; encrypted secrets; customer-visible access reports.	Human factor; mitigated via separation of duties and monitoring.

Threat

Malicious tenant attempting cross-vault inference

Mitigation

Row-level security, strict tenant-scoped keys, request context enforcement, audit logs for all access.

Residual Risk

Misconfiguration risk; mitigated by policy tests + continuous verification.

Threat

Compromised webhook ingestion key

Mitigation

Short-lived tokens where supported; rotate keys; scope keys to vault; rate limiting; anomaly detection; replay protection.

Residual Risk

Exposure window until rotation; mitigated by TTL + rapid revocation.

Threat

Replay attacks on historical payloads

Mitigation

Nonce/timestamp validation; idempotency keys; bounded retention for raw webhooks; signature verification.

Residual Risk

Clock skew edge cases; mitigated by allowed window + alerts.

Threat

Poisoned telemetry (intentional garbage input)

Mitigation

Entropy Gate rejects un-hydrated/minified payloads; schema validation; quarantine on rejection spikes; degraded-mode signals.

Residual Risk

Resource pressure; mitigated by backpressure + isolation quotas.

Threat

Insider access (support / ops)

Mitigation

Least-privilege RBAC; break-glass access with approvals; immutable audit logs; encrypted secrets; customer-visible access reports.

Residual Risk

Human factor; mitigated via separation of duties and monitoring.

System Failure Semantics

10 // FAILURE_SEMANTICS

Formal degraded-operation policy: how the system behaves when dependencies fail — and what it refuses to do when uncertain.

Fail closed vs. fail open

Default: fail closed on trust

Matching: fail closed (silence) if proof cannot be established.
Ingestion: accept telemetry, but quarantine/mark degraded if payloads are unusable.
Validation: do not "declare verified" without statistical confidence.

> GUARANTEE: uncertain_state → SILENCE

Dependency failures

Explicit degraded modes

Git provider API unavailable: pause Lane 1 backfill; keep Lane 2 ingest; mark "INDEXING_DEGRADED: VCS_UNREACHABLE".
Webhook ingestion delayed: maintain ordering where possible; surface "INGEST_LATENCY: ELEVATED"; avoid invalid validation windows.
Validation insufficient traffic: pause soak until significance returns; do not treat low traffic as success.

Explicit non-behavior (what Déjà refuses)

Déjà never…

> NEVER: surface "likely" links without proof

> NEVER: auto-remediate production without human-controlled workflows

> NEVER: treat missing data as a positive validation signal

> NEVER: infer intent from user behavior

Data Model Overview

11 // DATA_MODEL

Logical schema (conceptual) to reason about isolation, lifecycle, and determinism — not SQL.

Entities

Core objects

Vault — tenant boundary. Owns all data + keys.
Fingerprint — SHA-256 identity of incident signature.
Alias — canonical path ↔ legacy path mapping (rename graph).
ResolutionArtifact — PR/commit that claims to fix an incident.
ValidationEvent — rate gate outcomes + revert signals.

Relationships & cardinality

How they connect

Vault 1—N Fingerprint
Fingerprint 0—N ResolutionArtifact (candidates)
ResolutionArtifact 0—N ValidationEvent
Fingerprint 0—N Alias (path history)
Alias N—1 CanonicalPath (by vault)

All entities are vault-scoped. Cross-vault joins are prohibited by enforcement at query and key namespaces.

Performance & Scaling Characteristics

12 // PERFORMANCE_SCALING

Latency is necessary, but capacity behavior builds confidence. Bounds below are approximate and deployment-dependent.

Real-time ingest

Throughput per vault

Target SLO: <120ms receipt → normalization
Backpressure + rate limits per key/vault
Quarantine mode when rejection ratio spikes

> CAPACITY: bounded by CPU for hashing + frame filtering

Backfill complexity

Time travel bounds

Commit traversal is O(n commits) within configured horizon
Diff parsing bounded by touched files + PR metadata
Runs as batch lane: completeness over latency

> DEFAULT_HORIZON: 10 years (configurable)

Graph growth + pruning

Compaction strategy

Metadata growth roughly proportional to unique fingerprints + aliases
Prune raw webhooks after 7 days (replay window)
Bound diff contexts (e.g., 30 days) then purge

> STRATEGY: retain proofs, purge raw payloads

Explicit Non-Goals

13 // NON_GOALS

Sets expectations and prevents mis-selling. Déjà is precise by design.

Déjà does not…

Auto-fix production incidents
Replace alerting/monitoring
Infer intent from commits, tickets, or user behavior
Learn heuristics from your data to "guess" matches
Optimize recall at the expense of trust

Déjà is…

A deterministic recall layer for proven resolutions
A chain-of-custody builder across telemetry + code history
A trust engine that surfaces verified fixes, or remains silent
An audit-friendly memory vault for production incidents

Change Management & Versioning

14 // CHANGE_VERSIONING

How behavior changes over time without breaking auditability.

Fingerprint logic versioning

Compatibility guarantees

Fingerprint algorithm versions are immutable identifiers (e.g., fp_v1).
New versions introduced only with documented migration notes.
Verified matches reference the version used at time of validation.

> GUARANTEE: proofs always include version context

Re-computation policy

Migration strategy

Historical fingerprints are not silently re-written.
Optional re-index/backfill can compute new fingerprints in parallel.
UI can show fp_v1 vs fp_v2 collisions for review.

> MIGRATION: parallel compute → compare → explicit cutover

Glossary

15 // GLOSSARY

Definitions for non-authors and security reviewers.

Term	Definition
Deterministic Match	A relationship proven by strict equality, cryptographic hash collision, or explicit Git ancestry — never by semantic similarity.
Canonical Path	Normalized file identity used for matching across telemetry and version control history.
Anchor Frame	First actionable hydrated user-code frame selected from a stack trace after filtering vendor/entropy noise.
Invariant	Sanitized error signature template with dynamic segments removed to stabilize grouping.
Verified Fix	A candidate solution that passed validation gates (rate gate) and has not been reverted.
Silence	Intentional non-output state when the system cannot mathematically prove a link or cannot validate confidence.
Immunized	Downstream service applied a broadcasted verified fix before an incident occurred.

Appendices

16 // APPENDICES

World-class tier enhancements: realistic walkthroughs, worked examples, and control mappings.

Appendix A

End-to-end incident walkthrough

Recommended: include a realistic incident timeline from ingest → match → validation → verified recall on regression.

Appendix B

Validation math worked example

Show Rate Gate windows with traffic normalization and the "2 AM traffic drop" pause/resume behavior.

Appendix C

Security questionnaire mapping

Map controls to SOC2, ISO 27001, and GDPR requirements. Provide evidence pointers (logs, configs, policies).