predictability-gate
Operational boundary for deciding when not to let AI decide.
This repository documents one concrete way to decide
whether an AI system should be allowed to make predictions at all,
*before* model building begins.
It is not a product.
It is not a framework.
It is not a best practice.
It is an operational gate for stopping AI decisions
when responsibility, safety, or predictability cannot be guaranteed.
What this is
predictability-gate defines a pre-model decision gate used in real
operations to answer a single question:
Should this decision be delegated to AI, or not?
The gate exists to prevent situations where:
- a model is technically feasible,
- accuracy looks acceptable,
- but a wrong prediction would cause irreversible damage.
This repository focuses on stopping decisions, not optimizing models.
What this is NOT
- ❌ A machine learning framework
- ❌ An AI governance standard
- ❌ A compliance checklist
- ❌ A guarantee of safety or correctness
This repository does not claim completeness or universality.
It presents one concrete operational structure
for organizations that must carry real-world responsibility.
Core principles
The Predictability Gate is based on four non-negotiable principles:
- Responsibility comes before accuracy
- “What happens if we are wrong?” matters more than “Can we predict?”
- Irreversible misses must never be delegated to AI
- The final decision to proceed or stop is always made by a human operator
If these principles cannot be satisfied, the correct outcome is not to proceed.
Scope
This gate is designed primarily for numeric time-series AI use cases.
Cases that rely on:
- sound,
- smell,
- images,
- or human sensory judgment
are intentionally separated into different projects
with dedicated safety design and evaluation methods.
Mixing these domains prematurely is explicitly avoided.
Before defining the gate, this repository starts by documenting failure patterns
that made this boundary necessary.
Structure of the gate
The gate is executed within 48–72 hours, before any serious modeling work.
Gate components
- G1 – Information content
Do the numeric time-series contain sufficient predictive signal? - G2 – Business tolerance
Is there an operational “escape route” when predictions are wrong (safe-side fallback or decision deferral)? - G3 – Temporal consistency
Are results valid under time-aware validation (no leakage)? - G4 – Proxy feature potential
Do lag, dwell time, ratios, or rollups meaningfully improve predictability? - G5 – Operational dependency & irreversibility (override gate)
Would a *miss* cause irreversible harm (safety, quality, regulatory, or asset damage)? - G6 – Rough ROI
Does the expected benefit justify operational cost?
G5 overrides all other scores.
If a missed prediction is irreversible, the case is rejected.
Outcomes
The gate produces one of four outcomes:
- Proceed
→ Controlled A/B or shadow deployment - Limited pilot
→ Restricted scope with data improvement - Stop
→ Predictability or value not sufficient - Separate project
→ Non-structured data or human-judgment–centric case
Responsibility
This gate does not automate acceptance.
The final decision is always made by an explicitly named operational owner.
This is intentional.
AI does not carry responsibility.
People do.
Relationship to other concepts
This repository is part of a broader line of thinking about boundaries and responsibility:
- idk-lamp — a symbolic indicator that AI should stop and defer
- VCDesign / BOA / RP — design philosophies for responsibility boundaries
predictability-gate is where those ideas become operational reality.
Context
This project is designed as a practical signal that marks
the boundary where AI systems must stop deciding
and defer responsibility to humans.
This work originates from ongoing exploration of
design, responsibility, and boundaries
in AI-assisted systems.
1. Failure Cases
Failure Cases — Why Predictability Gate Exists
This document comes first for a reason.
Predictability Gate was designed by learning from these failures.
This document does not list detailed incidents or postmortems.
It documents failure patterns that repeatedly appear
*before* serious accidents, quality losses, or loss of trust occur.
These cases explain why predictability-gate exists.
Case 1 — Accuracy looked good, but responsibility was undefined
Situation
- Numeric time-series model achieved acceptable accuracy
- Validation metrics met internal thresholds
- Deployment pressure was high
What went wrong
- No clear owner for **stop / override**
- When predictions were wrong, escalation was delayed
- The system continued operating because “the model said so”
Why the gate would have stopped this
- G2 (Business tolerance) was not satisfied
No safe fallback or decision deferral existed. - G5 (Responsibility & irreversibility) was unclear.
Lesson
Accuracy without responsibility creates silent failure.
Case 2 — Non-structured signals were mixed too early
Situation
- Sound / image features were added to numeric data
- Overall model accuracy improved
- Evaluation was done with random cross-validation
What went wrong
- Failure modes were no longer observable
- Temporal leakage was masked by mixed features
- Degradation appeared only after deployment
Why the gate would have stopped this
- Scope separation rule was violated
- G3 (Temporal consistency) could not be validated properly
Lesson
Mixing domains hides failure modes before it improves performance.
Case 3 — A missed prediction caused irreversible damage
Situation
- False positives were manageable
- False negatives were considered “unlikely”
- No explicit irreversibility check was performed
What went wrong
- A single miss led to:
- Safety incident
- Quality escape
- Regulatory breach
- Rollback was not possible
Why the gate would have stopped this
- G5 override
Missed prediction was **irreversible**.
Lesson
Low probability does not reduce irreversible risk.
Case 4 — Pilot scope quietly expanded
Situation
- Started as a limited pilot
- Gradually expanded “because it worked”
- No formal re-approval step existed
What went wrong
- Conditions changed outside evaluated range
- Performance degraded silently
- No one noticed until damage occurred
Why the gate would have stopped this
- Limited pilot rule would have required re-evaluation
- Impact radius was not explicitly defined
Lesson
Expansion without re-gating is uncontrolled deployment.
Case 5 — The system could not be stopped
Situation
- Automated decision was integrated into operations
- Manual override existed only on paper
What went wrong
- Operators hesitated to intervene
- Stopping the system was socially or procedurally difficult
- Damage accumulated over time
Why the gate would have stopped this
- Precondition check for stop authority would have failed
Lesson
A system that cannot be stopped is already unsafe.
Common pattern across failures
Across all cases:
- Models were *technically valid*
- Failures were *organizational and operational*
- Responsibility boundaries were unclear
Predictability Gate exists to surface these issues
**before** model construction begins.
2. Overview
Overview — What is Predictability Gate
Predictability Gate is an **operational decision boundary**.
It exists to answer one question *before* any model is built:
Should this decision be delegated to AI, or not?
This gate does not optimize models.
It prevents **irreversible failures caused by wrong predictions**.
Key characteristics:
- Executed in **48–72 hours**
- Applied **before** PoC or modeling
- Focuses on **responsibility, not accuracy**
- Explicitly allows the decision: **do not proceed**
Predictability Gate is not a blocker.
It is a **safety boundary** that protects operations, people, and trust.
Context & Roots
This gate is not an isolated idea.
It is an operational outcome of long-term thinking about boundaries and responsibility.
If you want to understand **why these boundaries must exist**, see:
3. Gate Definition
Gate Definition
Predictability Gate consists of six evaluation items (G1–G6).
G1 — Information Content
Do the numeric time-series contain sufficient predictive signal?
Typical indicators:
- Target entropy
- Sum of mutual information (top features)
G2 — Business Tolerance
Is there an operational escape route when predictions are wrong?
Examples:
- Safe-side fallback
- Decision deferral
- Human confirmation
G3 — Temporal Consistency
Are results valid under time-aware validation?
Checks include:
- Time-series cross validation
- Shuffle degradation
- Leakage suspicion
G4 — Proxy Feature Potential
Is there room for improvement using proxy features?
Examples:
- Lag features
- Dwell time
- Ratios
- Rollups
G5 — Operational Dependency & Irreversibility (Override)
Would a **miss** cause irreversible damage?
If yes:
This case must not be handled by numeric time-series AI.
G5 overrides all other items.
G6 — Rough ROI
Does expected benefit justify operational cost?
This is a coarse check, not financial approval.
4. Templates
Templates
Case Card (5 minutes)
- Line / Equipment:
- Target KPI:
- Evaluation period:
Difficulty Map
- Data structure: S1 / S2 / S3
- Field dependency: D1 / D2 / D3
Initial classification:
- Numeric time-series
- Conditional
- Separate project
5. Scoring
Scoring Rules
Scores are **reference only**.
Absolute Rule
If **G5 = irreversible miss**, the case is rejected regardless of total score.
Scoring exists to support discussion,
not to override responsibility.
6. Irreversibility
Irreversibility
Irreversible misses include:
Safety
- Entrapment
- Burns
- Leaks
- Runaway conditions
Quality
- Market leakage
- Recall
- Brand damage
Regulation
- Environmental limits
- Labor safety
- Medical / food compliance
Assets
- Equipment destruction
- Long downtime
If these risks cannot be mitigated by logs alone,
the case must be separated.
7. SOP Numeric
SOP — Numeric Time-Series Cases
Day 0
- Case card
- Difficulty map
Day 0–1
- Data extraction
- MI / CV preparation
Day 1–3
- G1–G6 evaluation
- G5 override check
- Operator comment
Day 3
- Decision review
8. SOP Non-Struct
SOP — Non-Structured Data Cases (Minimum)
- Separate project
- Safety-first design
- Limited pilot only
- Stability KPI approval
- Operator sign-off
9. Consensus
Consensus Building
Start with:
- Irreversible risk
- Responsibility design
Do not start with:
- Accuracy
- Model performance
Always close with numbers:
- Degradation rate
- Reproducibility
- Escape routes
10. Handover
Handover Boundary — Non-Structured Data Cases
This document defines the **handover boundary** for cases that were intentionally excluded
from predictability-gate due to reliance on non-structured data
(sound, smell, images, or human sensory judgment).
This is **not an implementation guide**.
It exists to prevent irreversible failures
when responsibility shifts across teams.
Why this handover exists
The case was excluded **not because it is difficult**, but because:
- A missed prediction could cause **irreversible harm**
- The decision cannot be safely reduced to numeric time-series alone
- Responsibility cannot be clearly enforced within the original gate
This handover is a **risk-aware transfer**, not a rejection.
What was confirmed before handover
Before separating this case, the following were explicitly identified:
- ❌ Numeric time-series alone cannot prevent irreversible misses
- ❌ Logs alone cannot guarantee safe rollback
- ❌ Mixing structured and non-structured data would hide failure modes
The decision to separate was made **before model construction**.
Identified irreversible risks
The following categories were flagged as potentially irreversible if missed:
Safety
Entrapment, burns, leaks, runaway conditions
Quality
Market leakage, recalls, brand damage
Regulation
Environmental limits, labor safety, medical / food compliance
Assets
Equipment destruction, long downtime
If any of these risks cannot be mitigated by logs or alarms alone,
the case must not return to numeric-only handling.
Preconditions for any non-structured approach
Before any modeling or data collection begins,
the following **must be explicitly defined**:
Authority & Control
- Who has the **authority to stop the system**
- How and when manual override is triggered
Rollback
- Whether rollback is possible
- What the rollback target state is
- How long rollback takes
Redundancy
- Whether **double detection** is required
- What happens when detectors disagree
Scope limitation
- Initial deployment **must be limited**
- Shadow mode
- Restricted line
- Restricted time window
Full-scale deployment as the first step is prohibited.
Minimum evaluation requirements
Any non-structured approach must pass **all** of the following
before being considered for expansion:
- Condition degradation rate ≤ 15%
- Feature reproducibility ≥ 60%
- Impact radius clearly identified
- Which line
- Which shift
- Which operational unit
If these cannot be demonstrated,
the system must remain limited or stopped.
What is intentionally NOT provided
This handover does **not** include:
- Model architectures
- Feature extraction methods
- Performance targets
- Implementation timelines
Those belong to the receiving team’s responsibility.
This document transfers **risk context**, not solutions.
Responsibility statement
From this point forward:
- The original
predictability-gatescope ends here - The receiving team becomes the **primary risk owner**
- All further decisions must explicitly name a responsible operator
This handover does **not** absolve responsibility.
It **reassigns it clearly**.
Final note
Avoidance is not abandonment.
This boundary exists so that
non-structured data can be handled **deliberately, safely, and honestly**.
If at any point responsibility or irreversibility becomes unclear,
the correct action is to **stop and reassess**.