predictability-gate

Operational boundary for deciding when not to let AI decide.

This repository documents one concrete way to decide
whether an AI system should be allowed to make predictions at all,
*before* model building begins.

It is not a product.
It is not a framework.
It is not a best practice.

It is an operational gate for stopping AI decisions
when responsibility, safety, or predictability cannot be guaranteed.

What this is

predictability-gate defines a pre-model decision gate used in real operations to answer a single question:

Should this decision be delegated to AI, or not?

The gate exists to prevent situations where:

a model is technically feasible,
accuracy looks acceptable,
but a wrong prediction would cause irreversible damage.

This repository focuses on stopping decisions, not optimizing models.

What this is NOT

❌ A machine learning framework
❌ An AI governance standard
❌ A compliance checklist
❌ A guarantee of safety or correctness

This repository does not claim completeness or universality.

It presents one concrete operational structure
for organizations that must carry real-world responsibility.

Core principles

The Predictability Gate is based on four non-negotiable principles:

Responsibility comes before accuracy
“What happens if we are wrong?” matters more than “Can we predict?”
Irreversible misses must never be delegated to AI
The final decision to proceed or stop is always made by a human operator

If these principles cannot be satisfied, the correct outcome is not to proceed.

Scope

This gate is designed primarily for numeric time-series AI use cases.

Cases that rely on:

sound,
smell,
images,
or human sensory judgment

are intentionally separated into different projects
with dedicated safety design and evaluation methods.

Mixing these domains prematurely is explicitly avoided.

Before defining the gate, this repository starts by documenting failure patterns
that made this boundary necessary.

Structure of the gate

The gate is executed within 48–72 hours, before any serious modeling work.

Gate components

G1 – Information content
Do the numeric time-series contain sufficient predictive signal?
G2 – Business tolerance
Is there an operational “escape route” when predictions are wrong (safe-side fallback or decision deferral)?
G3 – Temporal consistency
Are results valid under time-aware validation (no leakage)?
G4 – Proxy feature potential
Do lag, dwell time, ratios, or rollups meaningfully improve predictability?
G5 – Operational dependency & irreversibility (override gate)
Would a *miss* cause irreversible harm (safety, quality, regulatory, or asset damage)?
G6 – Rough ROI
Does the expected benefit justify operational cost?

G5 overrides all other scores.
If a missed prediction is irreversible, the case is rejected.

Outcomes

The gate produces one of four outcomes:

Proceed
→ Controlled A/B or shadow deployment
Limited pilot
→ Restricted scope with data improvement
Stop
→ Predictability or value not sufficient
Separate project
→ Non-structured data or human-judgment–centric case

Responsibility

This gate does not automate acceptance.
The final decision is always made by an explicitly named operational owner.

This is intentional.

AI does not carry responsibility.
People do.

Relationship to other concepts

This repository is part of a broader line of thinking about boundaries and responsibility:

idk-lamp — a symbolic indicator that AI should stop and defer
VCDesign / BOA / RP — design philosophies for responsibility boundaries

predictability-gate is where those ideas become operational reality.

Context

This project is designed as a practical signal that marks
the boundary where AI systems must stop deciding
and defer responsibility to humans.

idk-lamp (official site)

This work originates from ongoing exploration of
design, responsibility, and boundaries
in AI-assisted systems.

VCDesign

1. Failure Cases

Failure Cases — Why Predictability Gate Exists

This document comes first for a reason.
Predictability Gate was designed by learning from these failures.

This document does not list detailed incidents or postmortems.
It documents failure patterns that repeatedly appear
*before* serious accidents, quality losses, or loss of trust occur.

These cases explain why predictability-gate exists.

Case 1 — Accuracy looked good, but responsibility was undefined

Situation

Numeric time-series model achieved acceptable accuracy
Validation metrics met internal thresholds
Deployment pressure was high

What went wrong

No clear owner for **stop / override**
When predictions were wrong, escalation was delayed
The system continued operating because “the model said so”

Why the gate would have stopped this

G2 (Business tolerance) was not satisfied
No safe fallback or decision deferral existed.
G5 (Responsibility & irreversibility) was unclear.

Lesson

Accuracy without responsibility creates silent failure.

Case 2 — Non-structured signals were mixed too early

Situation

Sound / image features were added to numeric data
Overall model accuracy improved
Evaluation was done with random cross-validation

What went wrong

Failure modes were no longer observable
Temporal leakage was masked by mixed features
Degradation appeared only after deployment

Why the gate would have stopped this

Scope separation rule was violated
G3 (Temporal consistency) could not be validated properly

Lesson

Mixing domains hides failure modes before it improves performance.

Case 3 — A missed prediction caused irreversible damage

Situation

False positives were manageable
False negatives were considered “unlikely”
No explicit irreversibility check was performed

What went wrong

A single miss led to:
- Safety incident
- Quality escape
- Regulatory breach
Rollback was not possible

Why the gate would have stopped this

G5 override
Missed prediction was **irreversible**.

Lesson

Low probability does not reduce irreversible risk.

Case 4 — Pilot scope quietly expanded

Situation

Started as a limited pilot
Gradually expanded “because it worked”
No formal re-approval step existed

What went wrong

Conditions changed outside evaluated range
Performance degraded silently
No one noticed until damage occurred

Why the gate would have stopped this

Limited pilot rule would have required re-evaluation
Impact radius was not explicitly defined

Lesson

Expansion without re-gating is uncontrolled deployment.

Case 5 — The system could not be stopped

Situation

Automated decision was integrated into operations
Manual override existed only on paper

What went wrong

Operators hesitated to intervene
Stopping the system was socially or procedurally difficult
Damage accumulated over time

Why the gate would have stopped this

Precondition check for stop authority would have failed

Lesson

A system that cannot be stopped is already unsafe.

Common pattern across failures

Across all cases:

Models were *technically valid*
Failures were *organizational and operational*
Responsibility boundaries were unclear

Predictability Gate exists to surface these issues
**before** model construction begins.

2. Overview

Overview — What is Predictability Gate

Predictability Gate is an **operational decision boundary**.

It exists to answer one question *before* any model is built:

Should this decision be delegated to AI, or not?

This gate does not optimize models.
It prevents **irreversible failures caused by wrong predictions**.

Key characteristics:

Executed in **48–72 hours**
Applied **before** PoC or modeling
Focuses on **responsibility, not accuracy**
Explicitly allows the decision: **do not proceed**

Predictability Gate is not a blocker.
It is a **safety boundary** that protects operations, people, and trust.

Context & Roots

This gate is not an isolated idea.
It is an operational outcome of long-term thinking about boundaries and responsibility.

If you want to understand **why these boundaries must exist**, see:

VCDesign — boundary-oriented design philosophy
idk-lamp — a symbol for stopping AI decisions safely

3. Gate Definition

Gate Definition

Predictability Gate consists of six evaluation items (G1–G6).

G1 — Information Content

Do the numeric time-series contain sufficient predictive signal?

Typical indicators:

Target entropy
Sum of mutual information (top features)

G2 — Business Tolerance

Is there an operational escape route when predictions are wrong?

Examples:

Safe-side fallback
Decision deferral
Human confirmation

G3 — Temporal Consistency

Are results valid under time-aware validation?

Checks include:

Time-series cross validation
Shuffle degradation
Leakage suspicion

G4 — Proxy Feature Potential

Is there room for improvement using proxy features?

Examples:

Lag features
Dwell time
Ratios
Rollups

G5 — Operational Dependency & Irreversibility (Override)

Would a **miss** cause irreversible damage?

If yes:

This case must not be handled by numeric time-series AI.

G5 overrides all other items.

G6 — Rough ROI

Does expected benefit justify operational cost?

This is a coarse check, not financial approval.

4. Templates

Templates

Case Card (5 minutes)

Line / Equipment:
Target KPI:
Evaluation period:

Difficulty Map

Data structure: S1 / S2 / S3
Field dependency: D1 / D2 / D3

Initial classification:

Numeric time-series
Conditional
Separate project

5. Scoring

Scoring Rules

Scores are **reference only**.

Absolute Rule

If **G5 = irreversible miss**, the case is rejected regardless of total score.

Scoring exists to support discussion,
not to override responsibility.

6. Irreversibility

Irreversibility

Irreversible misses include:

Safety

Entrapment
Burns
Leaks
Runaway conditions

Quality

Market leakage
Recall
Brand damage

Regulation

Environmental limits
Labor safety
Medical / food compliance

Assets

Equipment destruction
Long downtime

If these risks cannot be mitigated by logs alone,
the case must be separated.

7. SOP Numeric

SOP — Numeric Time-Series Cases

Day 0

Case card
Difficulty map

Day 0–1

Data extraction
MI / CV preparation

Day 1–3

G1–G6 evaluation
G5 override check
Operator comment

Day 3

Decision review

8. SOP Non-Struct

SOP — Non-Structured Data Cases (Minimum)

Separate project
Safety-first design
Limited pilot only
Stability KPI approval
Operator sign-off

9. Consensus

Consensus Building

Start with:

Irreversible risk
Responsibility design

Do not start with:

Accuracy
Model performance

Always close with numbers:

Degradation rate
Reproducibility
Escape routes

10. Handover

Handover Boundary — Non-Structured Data Cases

This document defines the **handover boundary** for cases that were intentionally excluded
from predictability-gate due to reliance on non-structured data
(sound, smell, images, or human sensory judgment).

This is **not an implementation guide**.
It exists to prevent irreversible failures
when responsibility shifts across teams.

Why this handover exists

The case was excluded **not because it is difficult**, but because:

A missed prediction could cause **irreversible harm**
The decision cannot be safely reduced to numeric time-series alone
Responsibility cannot be clearly enforced within the original gate

This handover is a **risk-aware transfer**, not a rejection.

What was confirmed before handover

Before separating this case, the following were explicitly identified:

❌ Numeric time-series alone cannot prevent irreversible misses
❌ Logs alone cannot guarantee safe rollback
❌ Mixing structured and non-structured data would hide failure modes

The decision to separate was made **before model construction**.

Identified irreversible risks

The following categories were flagged as potentially irreversible if missed:

Safety

Entrapment, burns, leaks, runaway conditions

Quality

Market leakage, recalls, brand damage

Regulation

Environmental limits, labor safety, medical / food compliance

Assets

Equipment destruction, long downtime

If any of these risks cannot be mitigated by logs or alarms alone,
the case must not return to numeric-only handling.

Preconditions for any non-structured approach

Before any modeling or data collection begins,
the following **must be explicitly defined**:

Authority & Control

Who has the **authority to stop the system**
How and when manual override is triggered

Rollback

Whether rollback is possible
What the rollback target state is
How long rollback takes

Redundancy

Whether **double detection** is required
What happens when detectors disagree

Scope limitation

Initial deployment **must be limited**
- Shadow mode
- Restricted line
- Restricted time window

Full-scale deployment as the first step is prohibited.

Minimum evaluation requirements

Any non-structured approach must pass **all** of the following
before being considered for expansion:

Condition degradation rate ≤ 15%
Feature reproducibility ≥ 60%
Impact radius clearly identified
- Which line
- Which shift
- Which operational unit

If these cannot be demonstrated,
the system must remain limited or stopped.

What is intentionally NOT provided

This handover does **not** include:

Model architectures
Feature extraction methods
Performance targets
Implementation timelines

Those belong to the receiving team’s responsibility.

This document transfers **risk context**, not solutions.

Responsibility statement

From this point forward:

The original predictability-gate scope ends here
The receiving team becomes the **primary risk owner**
All further decisions must explicitly name a responsible operator

This handover does **not** absolve responsibility.
It **reassigns it clearly**.

Final note

Avoidance is not abandonment.
This boundary exists so that
non-structured data can be handled **deliberately, safely, and honestly**.

If at any point responsibility or irreversibility becomes unclear,
the correct action is to **stop and reassess**.