Exchangeable K2/K3 Strict Mixtures, May 2026

Source artifact rendered for reading.

This report tests whether the K2 frontier is limited by component count or by a non-exchangeable parameterization.

Runs

Family Summary

model state NLL pred-y NLL cov90 state RMSE var ratio
K3 mixture IWAE h4 k32 4.525 0.931 0.603 3.042 0.909
K2 mixture IWAE h4 k32 4.859 1.008 0.584 3.303 0.922
K2 generic Power-EP alpha 0.5 6.764 0.841 0.640 2.838 0.507
exchangeable K2 mixture IWAE h4 k32 8.002 1.064 0.593 3.365 2.209
exchangeable K3 mixture IWAE h4 k32 8.146 1.011 0.592 3.626 3.398

Stressor Summary

model state NLL pred-y NLL cov90 state RMSE var ratio
K3 mixture IWAE h4 k32 5.965 0.423 0.440 4.545 0.159
K2 mixture IWAE h4 k32 6.370 0.428 0.428 4.579 0.149
K2 generic Power-EP alpha 0.5 8.366 0.394 0.670 3.905 0.362

Interpretation

K3 improves state density and predictive-y modestly over K2, both on the family grid and on stressors. This says component count still matters. The explicitly exchangeable parameterization, however, is not an immediate win: it improves neither state NLL nor predictive-y and inflates variance ratio.

Decision

K3 is a valid Pareto candidate. The exchangeable parameterization should be treated as a diagnostic negative result, not a promoted model. Further K growth should be justified by a specific diagnostic rather than broadened by default.