Validation & Methods
Canonical, public methodology for Mindforge Intelligence products.
Last Updated: February 9, 2026
Market State Detector (MSD)
Performance Definitions
Hit rate = hits ÷ (hits + false positives)
Lead time = business days from alert timestamp to first occurrence of the event threshold
Alerts/year = total state activations ÷ years in validation period
BD (business days) = NYSE/Nasdaq trading days
0 BD (zero business days) = alert timestamp on the same trading day as the first hit, published pre‑market before the regular NYSE/Nasdaq cash-session open.
Validation period shown = 2012–2024 (extended research 1990–2024 under NDA)
FP (false positive) = an alert episode that does not meet any state’s pre‑published hit criteria within its scoring horizon (BD) after episode collapse and cooldown. FP is evaluated at the episode level (not per trigger day).
Signal Robustness
MSD classifications undergo perturbation analysis to test stability under measurement uncertainty. By introducing controlled noise into metric counts (bootstrapped resampling of episode-level outcomes), we verify that precision and recall remain stable rather than dependent on specific boundary conditions.
This confirms that published hit rates reflect robust classification behavior, not artifacts of sample-specific thresholds. Full perturbation methodology and confidence intervals are available upon request as part of the institutional validation packet.
Event Criteria (what counts as a “hit”)
Systemic Stress
Hit if any of: VIX ≥ 35; SPX 5‑day ≤ −3.0%; SPX 10‑day ≤ −6.0%; SPX 20‑day ≤ −10.0%.
Volatility Spike
Hit if any of: VIX ≥ 30; SPX 5‑day ≤ −3.0%; SPX 10‑day ≤ −5.0%.
Stress (Advisory)
Hit if any of: VIX ≥ 30; SPX 5‑day ≤ −3.0%; SPX 10‑day ≤ −5.0%.
Turning
Hit if any of: SPX 10‑day ≤ −2.0%; SPX drawdown ≤ −4.0%; VIX ≥ 25 for 2 consecutive trading days.
Calm
Active when all other alarms are inactive (informational regime classification).
Event Horizons
- Volatility Spike: 5 BD
- Stress (Advisory): 7 BD
- Systemic Stress: 10 BD
- Turning: 20 BD
Methodology (Public Summary)
Compliance
For institutional research only. Historical, backtested results (2012–2024 unless noted). Not investment advice. See our Disclaimers.
What we backtest
- MSD evaluates five production market-state alarms: systemic stress, volatility spike, turning, Stress (Advisory), and a meta calm state.
- Performance summaries are maintained in a versioned internal validation archive (single source of truth). A full validation packet with current figures and provenance is available on request via the validation packet form.
Data and period
- Historical period: multi‑year history (e.g., 2012–2024) as reflected in the validation files.
- Inputs: the Mindforge Signal Platform time series and public market data (SPX and VIX).
- Time base: business‑day (BD) calendar; timestamps validated and stored in ISO 8601.
What we do / What we don’t do
We do
- Classify market states with fixed, rules‑based definitions
- Evaluate alerts against independent market events (SPX/VIX)
- Use business‑day horizons and deterministic episode logic
- Publish definitions and validation methods publicly
We don’t
- Provide price targets or trade recommendations
- Tune rules on evaluation windows (no in‑sample optimization)
- Score alerts with look‑ahead or calendar‑day shortcuts
- Claim endorsement by data sources or agencies
How an alarm is evaluated (at a glance)
- Fixed rules: Each alarm is specified by an immutable definition (no in‑sample tuning in the backtest).
- Daily evaluation: Conditions are checked per business day against historical inputs.
- Episodes: Consecutive alert days are collapsed into alert “episodes” using business‑day cooldowns and windows.
- Market events: Independent market events (e.g., VIX and SPX patterns) are identified and deduplicated into event episodes.
- Scoring: An alert episode is a “hit” if a qualifying event occurs within a forward‑looking scoring horizon (in business days). Otherwise it is a miss; episodes with no event are false positives.
- Metrics: Precision, recall, and F1 are computed from episodes; figures reported on the website come from these validation files.
Design principles that reduce overfitting risk
- No in‑sample optimization: Backtests run the published rule logic as‑is.
- Forward‑only scoring: Hits require future events within a predefined BD horizon; no look‑ahead.
- Deterministic time handling: All windows/horizons/cooldowns are business‑day based.
- Separation of concerns: Historical performance validation files (manifests) are presentation SoT and are not used by evaluation logic.
- Snapshot parity: Internal regression checks ensure outputs remain stable across versions.
Concrete safeguards
- Independent event labeling and deduplication with fixed event cooldown (10 BD).
- Episode collapsing to prevent double‑counting during prolonged spells.
- Forward‑only scoring horizons (no look‑ahead leakage).
- Business‑day determinism with ISO‑8601 timestamps.
- Rule immutability per run (no parameter tuning inside the backtest).
- Presentation SoT isolation and regression snapshot parity checks.
- Reproducible artifacts: episode‑level CSVs (alerts, events, hits) for independent verification.
Data sources
We use public market data (SPX, VIX) and public environmental datasets (e.g., NOAA, NASA, USGS). References to third‑party data providers are for sourcing only and do not imply endorsement.
Validation protocols (walk‑forward and robustness)
- Held‑out era evaluation on excluded historical eras; no tuning on the evaluation period.
- Walk‑forward validation: expanding-window approach with 2-year training, 1-year test, 90-day roll. Era splits (SC24/SC25) validated separately.
- Leave‑one‑era‑out to assess robustness by omitting eras and testing excluded windows.
- Era stratification to surface instability and confirm consistency across market conditions.
- Cross‑domain sanity checks where applicable.
- Optional inactive‑day baselines using identical episode windows are available for context in the internal validation materials.
For complete overfitting controls and offline reproduction procedures, we provide an auditor‑friendly validation packet on request via the validation packet form.
Reproducibility
- Public figures are sourced from versioned internal validation records; website statistics are derived directly from this single source of truth.
- Validation Packet — an auditor‑friendly package to reproduce metrics without environment access. Request via the form and we will deliver the packet by email; certain sensitive artifacts may require an NDA.
Compliance framing
All analytics are informational research or risk‑classification only and must not be construed as investment advice, solicitation, or performance guarantees.
FAQ
Are the results real‑time or backtested?
Figures shown are historical/backtested for 2012–2024 unless otherwise stated. Live operation began in 2025.
Can I see episode‑level data?
We provide episode‑level CSV exports and versioned validation summaries on request via the validation packet form; certain sensitive artifacts may require an NDA.
Is this investment advice?
No. This is an informational classification tool. See Disclaimers for details.
Space Weather Early Warning System (SEWS)
What SEWS Does
SEWS classifies space weather risk using upstream Forbush Decrease detection. It provides probabilistic risk context days ahead of many downstream storm-time indicators and public alerts.
SEWS is designed as a complementary upstream scenario planning layer, not a replacement for official space weather forecasts. It does not predict specific events or their timing.
Not affiliated with NOAA, ESA, NASA, or any government space weather service.
SEWS Performance Definitions
Precision = when the system alerts, how often a cataloged Forbush Decrease event occurs within the scoring horizon (14 calendar days)
Events Caught (Recall) = of all cataloged events in the period, what fraction the system detected
Lead time = calendar days from alert classification to Forbush Decrease onset
Alerts/year = average number of risk classifications per year across the validation period
Validation period = 2010-2024 (15 years, 31 cataloged FD events)
Walk-forward validation = 9-fold expanding-window (train on 2010-2016 base, test on each subsequent year 2016-2024)
Tier Performance (Full-Period Backtest, 2010-2024)
Wider net. Catches more events. More false alarms. Use for early scenario planning.
Selective. Higher confidence. Fewer alerts. Worth escalating monitoring posture.
Both tiers deliver multi-day lead time. The tradeoff is coverage vs confidence: ELEVATED catches more events with more false alarms, CRITICAL is more selective with fewer alerts.
Walk-Forward Validation (Out-of-Sample)
9-fold walk-forward (train on 2010-2016 base + expanding window, test on each subsequent year 2016-2024). 14 events in walk-forward test periods.
| Tier | Precision (fold-avg) | Events Caught | Lead (event-weighted) |
|---|---|---|---|
| ELEVATED | 17% ± 17% | 79% (11/14 events) | ~6.3 days |
| CRITICAL | 75% ± 14% | 57% (8/14 events) | ~7.6 days |
What validates: Multi-day lead time holds out-of-sample (6-8 days). CRITICAL precision is strong (75% fold-averaged). ELEVATED recall is strong (79%, catches most events).
Known limitations: High fold-to-fold variance on ELEVATED precision (sparse events per year). Small sample (14 events in walk-forward test periods).
SEWS Methodology
Detection approach
SEWS uses environmental stress indicators and public solar and geomagnetic data streams to classify Forbush Decrease risk. Validation is performed against a pre-published event catalog (2010-2024) using a fixed 14-day scoring horizon.
Data sources
- Public space weather and geomagnetic datasets (for example, NOAA SWPC)
- Public CME and solar event catalogs (for example, NASA DONKI)
- Environmental monitoring networks used for event cataloging and cross-checks
- Mindforge Signal Platform (composite indices)
What we do / What we don't do
We do
- Classify environmental risk states (NORMAL / ELEVATED / CRITICAL)
- Deliver probabilistic upstream context days before official alerts
- Walk-forward validate with expanding-window methodology
- Publish tier performance transparently (including false alarm rates)
We don't
- Predict specific events or their timing
- Replace NOAA/ESA confirmed-event forecasts
- Claim affiliation with government space weather services
- Guarantee future performance based on historical validation
Overfitting controls
- Out-of-sample walk-forward validation (9 folds, 2016-2024)
- Separate reporting of full-period backtests (descriptive) vs. walk-forward results (out-of-sample)
- Event-weighted lead time is used for out-of-sample headline lead time (consistent with the published validation notes)
- Transparent tier tradeoffs: ELEVATED prioritizes event coverage, CRITICAL prioritizes confidence
Informational research only. Not investment, operational, or safety advice. Past performance does not guarantee future results.