K2 Pareto Lock, May 2026

Source artifact rendered for reading.

Back to post Raw Markdown

This report locks the starting K2 nonlinear-filter frontier before the exchangeability, predictive-consistency, VSMC/FIVO, and flow experiments.

Runs

Family: outputs/cloud_downloads/k2_pareto_lock_family_1000_2026-05-01
Stressors: outputs/cloud_downloads/k2_pareto_lock_stressors_1000_2026-05-01
Config families: active nonlinear family configs plus weak/intermittent/zero/random-normal stressors.
Seeds: 321,322,323
Steps: 1000

Family Summary

model	state NLL	pred-y NLL	cov90	state RMSE	var ratio
K2 IWAE h4 k32 + pre-update predictive scoring	4.614	0.971	0.599	3.052	0.895
K2 IWAE h4 k32	4.869	1.004	0.584	3.316	0.918
K2 IWAE h4 k16 + local ADF projection w0.3	5.419	0.927	0.587	3.090	0.809
K2 generic Power-EP alpha 0.5	6.764	0.841	0.640	2.838	0.507
promoted strict Gaussian baseline	4159.987	3217.708	0.479	3.337	0.401

Stressor Summary

model	state NLL	pred-y NLL	cov90	state RMSE	var ratio
K2 IWAE h4 k32	6.005	0.428	0.445	4.507	0.156
K2 IWAE h4 k32 + pre-update predictive scoring	6.049	0.427	0.444	4.503	0.156
K2 generic Power-EP alpha 0.5	8.366	0.394	0.670	3.905	0.362

Interpretation

The K2 mixture IWAE row is the clean baseline to carry forward. It is reference-free, stable across the stressors, and dramatically better than the strict Gaussian baseline on nonlinear family state density. Power-EP remains a useful predictive/coverage comparator, but it gives up too much state density. Pre-update predictive scoring is worth testing, but at Step 0 it is not yet a separate promoted objective.

Decision

Carry direct_mixture_k2_joint_iwae_h4_k32 forward as the baseline. Keep Power-EP as a comparator. Treat pre-update predictive scoring as an objective variant, not as a locked default.