K2 Pareto Lock, May 2026

Source artifact rendered for reading.

This report locks the starting K2 nonlinear-filter frontier before the exchangeability, predictive-consistency, VSMC/FIVO, and flow experiments.

Runs

Family Summary

model state NLL pred-y NLL cov90 state RMSE var ratio
K2 IWAE h4 k32 + pre-update predictive scoring 4.614 0.971 0.599 3.052 0.895
K2 IWAE h4 k32 4.869 1.004 0.584 3.316 0.918
K2 IWAE h4 k16 + local ADF projection w0.3 5.419 0.927 0.587 3.090 0.809
K2 generic Power-EP alpha 0.5 6.764 0.841 0.640 2.838 0.507
promoted strict Gaussian baseline 4159.987 3217.708 0.479 3.337 0.401

Stressor Summary

model state NLL pred-y NLL cov90 state RMSE var ratio
K2 IWAE h4 k32 6.005 0.428 0.445 4.507 0.156
K2 IWAE h4 k32 + pre-update predictive scoring 6.049 0.427 0.444 4.503 0.156
K2 generic Power-EP alpha 0.5 8.366 0.394 0.670 3.905 0.362

Interpretation

The K2 mixture IWAE row is the clean baseline to carry forward. It is reference-free, stable across the stressors, and dramatically better than the strict Gaussian baseline on nonlinear family state density. Power-EP remains a useful predictive/coverage comparator, but it gives up too much state density. Pre-update predictive scoring is worth testing, but at Step 0 it is not yet a separate promoted objective.

Decision

Carry direct_mixture_k2_joint_iwae_h4_k32 forward as the baseline. Keep Power-EP as a comparator. Treat pre-update predictive scoring as an objective variant, not as a locked default.