Computational · Paper · 2026

Recovering LLM-Persona Accuracies from Unlabeled Votes

Zenodo

Catalog Row152
Citation KeyFriedman2026RecoveringLLMPersonaAccuracies152
Paper FolderAvailable

Overview

Extracted from the local paper documentation when available.

Algebraic (NTQR) evaluation infers how accurate a group of noisy classifiers was on a finite test using only their responses — no answer key. We test this end to end on real large language models. Three trader "personas" (optimistic, neutral, pessimistic), instantiated as system prompts, each make a binary bullish/bearish call on the same 64 market scenarios; we run the identical trio through six locally-hosted models via Ollama. For each model we recover per-persona, per-label accuracy with ErrorIndependentEvaluation (unsupervised) and score it against the authored ground truth (supervised), which is used only as a check. On the five models whose three judges all varied (mistral:latest, gemma4:latest, gemma3:4b, gemma2:2b, granite4.1:3b), the unsupervised algebra recovered persona accuracies to a mean absolute error of 0.012, within the 0.102 sampling-noise floor across all six per-labe

algebraic evaluationNTQRunsupervised evaluationevaluation on unlabeled dataLLM-as-judgeerror-independent evaluationensemble evaluabilityconstant classifierAI safety warning lightreproducible researchanswer-key-free recoverylocal large language models

Use Notes

Concise findings and methods pulled from README/SKILL documentation.

Findings / Concepts
  • algebraic evaluation
  • NTQR
  • unsupervised evaluation
  • evaluation on unlabeled data
  • LLM-as-judge
Methods / Techniques
  • Not yet summarized.

Citation

Plain-text citation for quick reuse.

Friedman, Daniel Ari. 2026. Recovering LLM-Persona Accuracies from Unlabeled Votes. Zenodo.

Primary source Documentation Source repository BibTeX

Related in Computational

Other catalogued works in the same domain.