Overview
Extracted from the local README when available.
Abstract This paper formulates on-policy distillation as active inference in finite variational models, with exact claims only for declared objects and interpretive claims explicitly bounded outside them. In the construction, the intractable teacher policy plays the role of the generative model $p(o,s)$, the tractable student policy is the approximate posterior $q(s)$, and the per-token reverse-KL distillation loss is variational free energy up to the evidence constant, $F = D {\mathrm{KL}}(q\,\ \,p(s\mid o)) - \log p(o)$, whose KL target is the teacher-induced posterior $p(s\mid o)\propto p(o,s)$ . The title's "as" is therefore a scoped mathematical correspondence rather than the slogan OPD = Active Inference. Variational free energy names the realized-rollout distillation loss; expected free energy remains the planning-side objective by which the pymdp agent selects actions . On-policy
Artifacts
Tracked documentation and PDFs served directly from this folder.
- Friedman_2026_Onpolicy_c6b5ec49.pdf 8,102,450 bytes