DéjàDéjà
Version 1.0 • PublicStatus: ActiveClassification: Technical Documentation

Déjà System Specification & Architecture Reference

Deterministic, team-level institutional memory for production incidents. This document defines the core physics, normalization pipeline, integration standards, security posture, validation math, lifecycle semantics, and troubleshooting, including how different plan tiers (Observer, Pro, Enterprise) change recall depth and data retention behavior.

Structure → Core Physics → Normalization → Integration → Config → Security → Validation → Lifecycle → Data → Plans → Appendices

1. Core concepts

01 // CORE_PHYSICS

The mathematical principles behind the deterministic engine and how it behaves as a team-level system across different recall horizons (90-day Observer, 10-year Pro, multi-year Enterprise). No inference. No probabilistic guesses.

1.1 Determinism vs. probability • Claim 4 • No AI inference

The Thesis: Rejection of Inference

Most incident resolution tools rely on probabilistic models (LLMs, embeddings, similarity) to guess relationships. Déjà rejects this for production infrastructure: a "high confidence guess" is still noise if an operator must manually verify it.

> MATCH: 1 (proven)  |  > NO_MATCH: 0 (silent)

Promise: Déjà offers proven matches or silence. Silence is preferable to hallucination.

Claim 4: Deterministic matching without real-time ML inference

Trust Properties

  • Zero hallucination: cannot invent a link that does not exist.
  • Zero data leakage: source code is never passed to a third-party generative model.
  • Auditability: every match traces to a commit hash, path identity, or error signature.
"…matching the canonical path… comprises a deterministic match that does not require real-time inference by a machine learning model."
Git Identity vs. Vector Similarity

File identity is resolved through Git ancestry and explicit renames (e.g., git log --follow). Copy/paste/delete boundaries create new identity—no guessing.

Silence > Alert Fatigue

Déjà optimizes Signal-to-Noise over recall. If the system cannot mathematically validate a link, it does not surface it.

Determinism is Binary

1 = proven match. 0 = no match. "Likely" is not a state.

1.2 Dual-lane ingestion architecture

Time travel + real-time

Lane 1 backfills default-branch history (completeness over latency) within the configured recall horizon for the team (90 days on Observer; up to 10 years by default on Pro and above). Lane 2 ingests real-time webhooks (low latency). Both feed a shared normalization engine to ensure identical fingerprints for old and new incidents.

LANE_1 (HISTORICAL[ORG_PLAN_HORIZON]) + LANE_2 (REAL_TIME) -> NORMALIZATION -> CANONICAL_INDEX -> DETERMINISTIC_MATCHER
1.3 Predictive immunity (network effects) • Claim 15

Fix → Broadcast

Instead of "Crash → Fix," Déjà enables "Fix → Broadcast." Verified fixes propagate across dependency graphs to warn downstream services running vulnerable versions.

StatusMeaning
ExposedService runs code containing known defect pattern or imports vulnerable dependency version.
PatchedService applied explicit patch (commit or version bump confirms).
ImmunizedService moved Exposed → Patched without an incident event.

2. The normalization engine

02 // NORMALIZATION_ENGINE

How raw, noisy telemetry becomes stable fingerprints that can collide across time and tool boundaries.

2.1 Entropy Gate

Zero tolerance for un-hydrated code

Minified/obfuscated stack traces are rejected at the edge. Unusable payloads are "data pollution." Déjà does not guess original source paths.

> REQUIRED: hydrated_frames // source_maps: present // in_app: true
  • Reject minified bundles (e.g., a.js)
  • Reject hashed filenames (e.g., main.8f9a21b4.chunk.js)
  • Reject vendor frames (node_modules, runtime) unless whitelisted
2.2 Anchor Frame Logic

First actionable frame

Traverse frames F0→Fn. Skip vendor/middleware and entropy-rejected frames. Anchor on first hydrated User Code frame.

> ANCHOR_SELECT: first(in_app && hydrated && !vendor)

"Black hole defense": if anchor appears in generic utility paths, peek to caller and use compound anchoring.

2.3 Compound fingerprinting

Hash composition (SHA-256)

Fingerprint collisions define identity. Dynamic message segments are sanitized prior to hashing.

Fingerprint = SHA256(AnchorFrame + CallerContext + ExceptionInvariant)

3. Integration specifications

03 // INTEGRATION_STANDARDS

Healthy pipeline prerequisites for observability + version control. Built for least-privilege, audit-ready review.

3.1 Sentry / Datadog configuration

Source maps first

The Entropy Gate requires fully hydrated traces. A common CI/CD failure is reporting errors before source maps are processed.

> GOLDEN_RULE: build → upload_maps (wait success) → deploy
Sentry.init({
  beforeSend(event) {
    if (event.request?.data) event.request.data = "[Redacted]";
    if (event.user) { delete event.user.email; delete event.user.ip_address; }
    return event;
  }
});
3.2 Version control permissions

Least privilege

Tier 1 (Default): read-only metadata/diffs. Tier 2 (Optional): write scopes for patch PRs (never pushes to protected branches).

> TIER_1: contents/read, pull_requests/read, metadata/read
> TIER_2: contents/write, pull_requests/write (feature branches only)

Tokens encrypted at rest (AES-256-GCM), short-lived where supported. API-first fetching: diffs processed in memory; no full repo clones.

4. Advanced configuration (.deja.yaml)

04 // ADVANCED_CONFIGURATION

Tuning controls for edge cases. Day 0 defaults prioritize safety. Later tuning reduces fragmentation.

4.1 Sentinel frames

Non-actionable paths

Prevent "black hole" incidents by skipping internal libraries to force deeper anchor selection.

normalization:
  sentinels:
    - "src/middleware/**"
    - "src/utils/http-client.ts"
4.2 Grouping rules

Consolidate fragmentation

Merge fingerprints that should be treated as the same after refactors or identity breaks.

grouping:
  - target: "src/auth/NewLogin.ts"
    aliases:
      - "src/legacy/OldLoginController.js"
4.3 Custom sanitizers

Dirty invariants

Regex rules to strip high-entropy segments that break stable grouping.

sanitizers:
  - pattern: "TX-[A-Z0-9]{4}-[A-Z]+"
    replacement: "<tx_id>"

5. Security & compliance

05 // SECURITY_PRIVACY

CISO due diligence: isolation, retention, encryption, and "no training" guarantees.

5.1 Data isolation

Tenant separation

  • PostgreSQL Row Level Security (RLS) scoped to vault_id
  • Cache/queue namespacing with tenant-prefixed keys
  • Data residency: us-east-1 default; optional single-tenant EU deployments
5.2 No training guarantee

Contract + architecture

  • No neural weights: deterministic hashing, no probabilistic inference
  • Ephemeral processing: diffs processed in memory then discarded
  • Store metadata, not contiguous corpora for model training
5.3 Source code privacy

Metadata only

We store: file paths, commit hashes, function signatures, line numbers, bounded diff fragments.

We do not store: full file contents, .git directories, full history, unrelated assets.

> ENCRYPTION: AES-256 (at rest) // TLS 1.2+ (in transit) // HSTS: enforced
Data retention

Bounded + auditable

  • Knowledge graph metadata: retained for vault lifetime
  • Raw webhooks: retained 7 days for replay/debug then deleted
  • Diff contexts: bounded (e.g., 30 days) then purged

6. The validation orchestrator

06 // VALIDATION_ORCHESTRATOR

How Déjà proves a resolution. Validation is statistical; trust is surfaced as tiers, not guesses.

6.1 Rate gate (Claim 1)

Statistical validation

Compare pre/post windows normalized by traffic. Validation pauses when traffic lacks significance.

ΔE = (Rate(W_pre) - Rate(W_post)) / Rate(W_pre) * 100
Rate(W) = fingerprinted_errors / total_traffic
6.2 Revert detection (Claim 10)

-100 penalty

Reverts within 72 hours are the strongest signal of a failed solution signature.

> PENALTY: Confidence -= 100 // Outcome: mark Failed Attempt
6.3 Confidence scoring

Visibility tiers

TierThresholdBehavior
Silence< 70%No suggestion shown (preserves trust)
Possible Match70–90%Displayed; requires human confirmation
Verified Match> 90%Auto-displayed as known root cause

7. The incident lifecycle

07 // SYSTEM_LIFECYCLE

Finite state machine with system-driven transitions. Designed for calm infrastructure.

Incident State Machine

States

INGESTING → OPEN → CANDIDATE → VALIDATING → VERIFIED → REGRESSED → IMMUNIZED
System-driven transitions

Definitions

  • Ingesting: awaiting normalization (<120ms) then dropped or normalized
  • Candidate: merged PR touches anchor frame files
  • Validating: rate gate active; pauses during low traffic
  • Regressed: verified fingerprint reappears ("zombie bug")
  • Immunized: patched before incident occurred

8. Troubleshooting

08 // TROUBLESHOOTING

"Why didn't it match?" Because distributed systems require explicit failure analysis.

8.1 Debugging misses (false negatives)

No match found

  • Entropy rejection: minified trace → upload maps before deploy
  • Sentinel interference: sentinels too broad → refine patterns
  • Black hole: generic utility anchor + unstable message → add sentinel or sanitizer
8.2 Debugging bad matches (false positives)

Wrong fix shown

  • Monolith problem: anchor on huge generic file → sentinel it, deepen linkage
  • Generic error: invariant entropy too low → throw typed errors; improve messages

Threat Model

09 // THREAT_MODEL

Controls are only meaningful when mapped to adversaries. This section defines assumed threats, mitigations, and residual risk.

Threat
Malicious tenant attempting cross-vault inference
Mitigation
Row-level security, strict tenant-scoped keys, request context enforcement, audit logs for all access.
Residual Risk
Misconfiguration risk; mitigated by policy tests + continuous verification.
Threat
Compromised webhook ingestion key
Mitigation
Short-lived tokens where supported; rotate keys; scope keys to vault; rate limiting; anomaly detection; replay protection.
Residual Risk
Exposure window until rotation; mitigated by TTL + rapid revocation.
Threat
Replay attacks on historical payloads
Mitigation
Nonce/timestamp validation; idempotency keys; bounded retention for raw webhooks; signature verification.
Residual Risk
Clock skew edge cases; mitigated by allowed window + alerts.
Threat
Poisoned telemetry (intentional garbage input)
Mitigation
Entropy Gate rejects un-hydrated/minified payloads; schema validation; quarantine on rejection spikes; degraded-mode signals.
Residual Risk
Resource pressure; mitigated by backpressure + isolation quotas.
Threat
Insider access (support / ops)
Mitigation
Least-privilege RBAC; break-glass access with approvals; immutable audit logs; encrypted secrets; customer-visible access reports.
Residual Risk
Human factor; mitigated via separation of duties and monitoring.

System Failure Semantics

10 // FAILURE_SEMANTICS

Formal degraded-operation policy: how the system behaves when dependencies fail — and what it refuses to do when uncertain.

Fail closed vs. fail open

Default: fail closed on trust

  • Matching: fail closed (silence) if proof cannot be established.
  • Ingestion: accept telemetry, but quarantine/mark degraded if payloads are unusable.
  • Validation: do not "declare verified" without statistical confidence.
> GUARANTEE: uncertain_state → SILENCE
Dependency failures

Explicit degraded modes

  • Git provider API unavailable: pause Lane 1 backfill; keep Lane 2 ingest; mark "INDEXING_DEGRADED: VCS_UNREACHABLE".
  • Webhook ingestion delayed: maintain ordering where possible; surface "INGEST_LATENCY: ELEVATED"; avoid invalid validation windows.
  • Validation insufficient traffic: pause soak until significance returns; do not treat low traffic as success.
Explicit non-behavior (what Déjà refuses)

Déjà never…

> NEVER: surface "likely" links without proof
> NEVER: auto-remediate production without human-controlled workflows
> NEVER: treat missing data as a positive validation signal
> NEVER: infer intent from user behavior

Data Model Overview

11 // DATA_MODEL

Logical schema (conceptual) to reason about isolation, lifecycle, and determinism — not SQL.

Entities

Core objects

  • Vault — tenant boundary. Owns all data + keys.
  • Fingerprint — SHA-256 identity of incident signature.
  • Alias — canonical path ↔ legacy path mapping (rename graph).
  • ResolutionArtifact — PR/commit that claims to fix an incident.
  • ValidationEvent — rate gate outcomes + revert signals.
Relationships & cardinality

How they connect

Vault 1—N Fingerprint
Fingerprint 0—N ResolutionArtifact (candidates)
ResolutionArtifact 0—N ValidationEvent
Fingerprint 0—N Alias (path history)
Alias N—1 CanonicalPath (by vault)

All entities are vault-scoped. Cross-vault joins are prohibited by enforcement at query and key namespaces.

Performance & Scaling Characteristics

12 // PERFORMANCE_SCALING

Latency is necessary, but capacity behavior builds confidence. Bounds below are approximate and deployment-dependent.

Real-time ingest

Throughput per vault

  • Target SLO: <120ms receipt → normalization
  • Backpressure + rate limits per key/vault
  • Quarantine mode when rejection ratio spikes
> CAPACITY: bounded by CPU for hashing + frame filtering
Backfill complexity

Time travel bounds

  • Commit traversal is O(n commits) within configured horizon
  • Diff parsing bounded by touched files + PR metadata
  • Runs as batch lane: completeness over latency
> DEFAULT_HORIZON: 10 years (configurable)
Graph growth + pruning

Compaction strategy

  • Metadata growth roughly proportional to unique fingerprints + aliases
  • Prune raw webhooks after 7 days (replay window)
  • Bound diff contexts (e.g., 30 days) then purge
> STRATEGY: retain proofs, purge raw payloads

Explicit Non-Goals

13 // NON_GOALS

Sets expectations and prevents mis-selling. Déjà is precise by design.

Déjà does not…

  • Auto-fix production incidents
  • Replace alerting/monitoring
  • Infer intent from commits, tickets, or user behavior
  • Learn heuristics from your data to "guess" matches
  • Optimize recall at the expense of trust

Déjà is…

  • A deterministic recall layer for proven resolutions
  • A chain-of-custody builder across telemetry + code history
  • A trust engine that surfaces verified fixes, or remains silent
  • An audit-friendly memory vault for production incidents

Change Management & Versioning

14 // CHANGE_VERSIONING

How behavior changes over time without breaking auditability.

Fingerprint logic versioning

Compatibility guarantees

  • Fingerprint algorithm versions are immutable identifiers (e.g., fp_v1).
  • New versions introduced only with documented migration notes.
  • Verified matches reference the version used at time of validation.
> GUARANTEE: proofs always include version context
Re-computation policy

Migration strategy

  • Historical fingerprints are not silently re-written.
  • Optional re-index/backfill can compute new fingerprints in parallel.
  • UI can show fp_v1 vs fp_v2 collisions for review.
> MIGRATION: parallel compute → compare → explicit cutover

Glossary

15 // GLOSSARY

Definitions for non-authors and security reviewers.

TermDefinition
Deterministic MatchA relationship proven by strict equality, cryptographic hash collision, or explicit Git ancestry — never by semantic similarity.
Canonical PathNormalized file identity used for matching across telemetry and version control history.
Anchor FrameFirst actionable hydrated user-code frame selected from a stack trace after filtering vendor/entropy noise.
InvariantSanitized error signature template with dynamic segments removed to stabilize grouping.
Verified FixA candidate solution that passed validation gates (rate gate) and has not been reverted.
SilenceIntentional non-output state when the system cannot mathematically prove a link or cannot validate confidence.
ImmunizedDownstream service applied a broadcasted verified fix before an incident occurred.

Appendices

16 // APPENDICES

World-class tier enhancements: realistic walkthroughs, worked examples, and control mappings.

Appendix A

End-to-end incident walkthrough

Recommended: include a realistic incident timeline from ingest → match → validation → verified recall on regression.

Appendix B

Validation math worked example

Show Rate Gate windows with traffic normalization and the "2 AM traffic drop" pause/resume behavior.

Appendix C

Security questionnaire mapping

Map controls to SOC2, ISO 27001, and GDPR requirements. Provide evidence pointers (logs, configs, policies).