GPM bridges the gap between the efficiency of Bradley-Terry models and the expressiveness of pairwise comparison models.

Figure 1: (a) The BT model assigns a scalar reward to each response. (b) Pairwise models (PairRM/PairPM) evaluate every pair, leading to $\mathcal{O}(K^2)$ complexity. (c) Our GPM embeds each response once and computes preference scores via vector interactions, achieving $\mathcal{O}(K)$ complexity.
The core idea is to represent each response $y$ for a prompt $x$ as a multi-dimensional preference embedding vector $v_{y|x} \in \mathbb{R}^{2k}$. The preference score between two responses, $y_i$ and $y_j$, is then calculated using a skew-symmetric operator $R^{>}$:
$s(y_i > y_j | x) = \langle R^{>} v_{y_i|x}, v_{y_j|x} \rangle$
This formulation allows GPM to capture complex relationships, including intransitive (e.g., cyclic) preferences, which cannot be represented by simple scalar rewards. The model is fully expressive for any real skew-symmetric preference matrix.