Validation & Methods

Canonical, public methodology for Mindforge Intelligence products.

Last Updated: February 9, 2026

Market State Detector Space Weather Early Warning

Market State Detector (MSD)

State Performance (Full-Period Backtest, 2012–2024)

Episode-level results. An episode = consecutive alert days collapsed with business-day cooldown. Precision = hits ÷ (hits + false positives), scored within the state's forward-looking horizon.

State	Episodes	Hits	False Positives	Precision	Horizon
Systemic Stress State	9	9	0	100%	10 BD
Volatility Spike State	14	13	1	93%	5 BD
Turning State	61	58	3	95%	20 BD
Stress State (Advisory)	15	9	6	60%	7 BD

What these numbers mean: Systemic Stress State and Volatility Spike State are the primary crisis-alert states. 100% and 93% precision: in each episode classified, a qualifying market event subsequently occurred within the scoring horizon. Turning State at 95% precision across 61 episodes is the highest-frequency classification and the most tested.

Stress State (Advisory), 60% precision: Lower-confidence early-warning classification. Wider net, more false alarms. That tradeoff is intentional and disclosed here.

Historical backtested performance based on 2012–2024 using verified market data. Past classifications do not guarantee future accuracy. Extended history (1990–2024) available under NDA. Live operation began 2025.

Performance Definitions

Hit rate = hits ÷ (hits + false positives)

Lead time = business days from alert timestamp to first occurrence of the event threshold

Alerts/year = total state activations ÷ years in validation period

BD (business days) = NYSE/Nasdaq trading days

0 BD (zero business days) = alert timestamp on the same trading day as the first hit, published pre‑market before the regular NYSE/Nasdaq cash-session open.

Validation period shown = 2012–2024 (extended research 1990–2024 under NDA)

FP (false positive) = an alert episode that does not meet any state's pre‑published hit criteria within its scoring horizon (BD) after episode collapse and cooldown. FP is evaluated at the episode level (not per trigger day).

Signal Robustness

MSD classifications undergo perturbation analysis to test stability under measurement uncertainty. By introducing controlled noise into metric counts (bootstrapped resampling of episode-level outcomes), we verify that precision and recall remain stable rather than dependent on specific boundary conditions.

This confirms that published hit rates reflect robust classification behavior, not artifacts of sample-specific thresholds. Full perturbation methodology and confidence intervals are available upon request as part of the institutional validation packet.

Event Criteria (what counts as a "hit")

Systemic Stress State

Hit if any of: VIX ≥ 35; SPX 5‑day ≤ −3.0%; SPX 10‑day ≤ −6.0%; SPX 20‑day ≤ −10.0%.

Volatility Spike State

Hit if any of: VIX ≥ 30; SPX 5‑day ≤ −3.0%; SPX 10‑day ≤ −5.0%.

Stress State (Advisory)

Hit if any of: VIX ≥ 30; SPX 5‑day ≤ −3.0%; SPX 10‑day ≤ −5.0%.

Volatility Spike State and Stress State (Advisory) share outcome criteria but are triggered by different input thresholds and operate on different evaluation horizons.

Turning State

Hit if any of: SPX 10‑day ≤ −2.0%; SPX drawdown ≤ −4.0%; VIX ≥ 25 for 2 consecutive trading days.

Calm State

Active when all other alarms are inactive (informational regime classification).

Event Horizons

Volatility Spike State: 5 BD
Stress State (Advisory): 7 BD
Systemic Stress State: 10 BD
Turning State: 20 BD

Methodology (Public Summary)

Compliance

For institutional research only. Historical, backtested results (2012–2024 unless noted). Not investment advice. See our Disclaimers.

What we backtest

MSD classifies five market states: Systemic Stress State, Volatility Spike State, Turning State, Stress State (Advisory), and Calm State.
Performance summaries are maintained in a versioned internal validation archive. A full validation packet with current figures is available on request via the validation packet form.

Data and period

Historical period: multi‑year history (e.g., 2012–2024) as reflected in the validation files.
Inputs: the Mindforge Signal Platform time series and public market data (SPX and VIX).
Time base: business‑day (BD) calendar; timestamps validated and stored in ISO 8601.

What we do / What we don't do

We do

Classify market states with fixed, rules‑based definitions
Evaluate alerts against independent market events (SPX/VIX)
Use business‑day horizons and deterministic episode logic
Publish definitions and validation methods publicly

We don't

Provide price targets or trade recommendations
Tune rules on evaluation windows (no in‑sample optimization)
Score alerts with look‑ahead or calendar‑day shortcuts
Claim endorsement by data sources or agencies

How an alarm is evaluated (at a glance)

Fixed rules: Each alarm is specified by an immutable definition (no in‑sample tuning in the backtest).
Daily evaluation: Conditions are checked per business day against historical inputs.
Episodes: Consecutive alert days are collapsed into alert "episodes" using business‑day cooldowns and windows.
Market events: Independent market events (e.g., VIX and SPX patterns) are identified and deduplicated into event episodes.
Scoring: An alert episode is a "hit" if a qualifying event occurs within a forward‑looking scoring horizon (in business days). Otherwise it is a miss; episodes with no event are false positives.
Metrics: Precision, recall, and F1 are computed from episodes; figures reported on the website come from these validation files.

Design principles that reduce overfitting risk

No in‑sample optimization: Backtests run the published rule logic as‑is.
Forward‑only scoring: Hits require future events within a predefined BD horizon; no look‑ahead.
Deterministic time handling: All windows/horizons/cooldowns are business‑day based.
Separation of concerns: Historical performance validation files are not used by evaluation logic.
Snapshot parity: Internal regression checks ensure outputs remain stable across versions.

Concrete safeguards

Independent event labeling and deduplication with fixed event cooldown (10 BD).
Episode collapsing to prevent double‑counting during prolonged spells.
Forward‑only scoring horizons (no look‑ahead leakage).
Business‑day determinism with ISO‑8601 timestamps.
Rule immutability per run (no parameter tuning inside the backtest).
Validation file isolation and regression snapshot parity checks.
Reproducible artifacts: episode‑level CSVs (alerts, events, hits) for independent verification.

Data sources

We use public market data (SPX, VIX) and public environmental datasets (e.g., NOAA, NASA, USGS). References to third‑party data providers are for sourcing only and do not imply endorsement.

Validation protocols (walk‑forward and robustness)

Held‑out era evaluation on excluded historical eras; no tuning on the evaluation period.
Walk‑forward validation: expanding-window approach with 2-year training, 1-year test, 90-day roll. Era splits (SC24/SC25) validated separately.
Leave‑one‑era‑out to assess robustness by omitting eras and testing excluded windows.
Era stratification to surface instability and confirm consistency across market conditions.
Cross‑domain sanity checks where applicable.
Optional inactive‑day baselines using identical episode windows are available for context in the internal validation materials.

For complete overfitting controls and offline reproduction procedures, we provide an auditor‑friendly validation packet on request via the validation packet form.

Reproducibility

Public figures are sourced from versioned internal validation records. Website statistics are derived directly from those records.
Validation Packet: an auditor-friendly package to reproduce metrics without environment access. Request via the form and we will deliver the packet by email; certain sensitive artifacts may require an NDA.

Compliance framing

All analytics are informational research or risk‑classification only and must not be construed as investment advice, solicitation, or performance guarantees.

FAQ

Are the results real‑time or backtested?

Figures shown are historical/backtested for 2012–2024 unless otherwise stated. Live operation began in 2025.

Can I see episode‑level data?

We provide episode‑level CSV exports and versioned validation summaries on request via the validation packet form; certain sensitive artifacts may require an NDA.

Is this investment advice?

No. This is an informational classification tool. See Disclaimers for details.

Or email sales@mindforge.tech

Space Weather Early Warning System (SEWS)

What SEWS Does

SEWS classifies space weather risk using upstream Forbush Decrease detection. It provides probabilistic risk context days ahead of many downstream storm-time indicators and public alerts.

SEWS is designed as a complementary upstream scenario planning layer, not a replacement for official space weather forecasts. It does not predict specific events or their timing.

Not affiliated with NOAA, ESA, NASA, or any government space weather service.

SEWS Performance Definitions

Precision = when the system alerts, how often a cataloged Forbush Decrease event occurs within the scoring horizon (14 calendar days)

Events Caught (Recall) = of all cataloged events in the period, what fraction the system detected

Lead time = calendar days from alert classification to Forbush Decrease onset

Alerts/year = average number of risk classifications per year across the validation period

Validation period = 2010-2024 (15 years, 31 cataloged FD events)

Walk-forward validation = 9-fold expanding-window (train on 2010-2016 base, test on each subsequent year 2016-2024)

Tier Performance (Full-Period Backtest, 2010-2024)

ELEVATED

Wider net. Catches more events. More false alarms. Use for early scenario planning.

23%

Precision

58%

Events Caught

~7d

Avg Lead

~7/yr

Alerts

CRITICAL

Selective. Higher confidence. Fewer alerts. Worth escalating monitoring posture.

46%

Precision

29%

Events Caught

~8d

Avg Lead

~2/yr

Alerts

Both tiers deliver multi-day lead time. The tradeoff is coverage vs confidence: ELEVATED catches more events with more false alarms, CRITICAL is more selective with fewer alerts.

Walk-Forward Validation (Out-of-Sample)

9-fold walk-forward (train on 2010-2016 base + expanding window, test on each subsequent year 2016-2024). 14 events in walk-forward test periods.

Tier	Precision (fold-avg)	Events Caught	Lead (event-weighted)
ELEVATED	17% ± 17%	79% (11/14 events)	~6.3 days
CRITICAL	75% ± 14%	57% (8/14 events)	~7.6 days

What validates: Multi-day lead time holds out-of-sample (6-8 days). CRITICAL precision is strong (75% fold-averaged). ELEVATED recall is strong (79%, catches most events).

Known limitations: High fold-to-fold variance on ELEVATED precision (sparse events per year). Small sample (14 events in walk-forward test periods).

SEWS Methodology

Detection approach

SEWS uses environmental stress indicators and public solar and geomagnetic data streams to classify Forbush Decrease risk. Validation is performed against a pre-published event catalog (2010-2024) using a fixed 14-day scoring horizon.

Data sources

Public space weather and geomagnetic datasets (for example, NOAA SWPC)
Public CME and solar event catalogs (for example, NASA DONKI)
Environmental monitoring networks used for event cataloging and cross-checks
Mindforge Signal Platform (composite indices)

What we do / What we don't do

We do

Classify environmental risk states (NORMAL / ELEVATED / CRITICAL)
Deliver probabilistic upstream context days before official alerts
Walk-forward validate with expanding-window methodology
Publish tier performance transparently (including false alarm rates)

We don't

Predict specific events or their timing
Replace NOAA/ESA confirmed-event forecasts
Claim affiliation with government space weather services
Guarantee future performance based on historical validation

Overfitting controls

Out-of-sample walk-forward validation (9 folds, 2016-2024)
Separate reporting of full-period backtests (descriptive) vs. walk-forward results (out-of-sample)
Event-weighted lead time is used for out-of-sample headline lead time (consistent with the published validation notes)
Transparent tier tradeoffs: ELEVATED prioritizes event coverage, CRITICAL prioritizes confidence

View SEWS Product Page

Informational research only. Not investment, operational, or safety advice. Past performance does not guarantee future results.