Perplexity

Calculate the perplexity of generated text using a pre-trained language model. Lower perplexity generally indicates more fluent and predictable text.

Signature

def perplexity(predictions: List[str], model_name: str = "gpt2") -> Dict[str, float]:
    ...

Parameters

predictions (List[str]): List of predicted texts for which to calculate perplexity.
model_name (str, optional): The name of the pre-trained language model to use for perplexity calculation (e.g., "gpt2", "distilgpt2"). Defaults to "gpt2".

Returns

Dictionary containing:

mean_perplexity (float): The average perplexity score across all predictions.
median_perplexity (float): The median perplexity score across all predictions.
scores (List[float]): A list of individual perplexity scores for each prediction.

Usage

from benchwise import perplexity

predictions = [
    "The quick brown fox jumps over the lazy dog.",
    "Bacon ipsum dolor amet short ribs."
]

result = perplexity(predictions)
print(f"Mean Perplexity: {result['mean_perplexity']:.2f}")
print(f"Individual Scores: {result['scores']}")

In Evaluations

from benchwise import evaluate, perplexity

@evaluate("gpt-4")
async def test_perplexity(model, dataset):
    responses = await model.generate(dataset.prompts)
    scores = perplexity(responses)

    return {
        "mean_perplexity": scores["mean_perplexity"],
        "median_perplexity": scores["median_perplexity"]
    }

Installation

This metric requires the transformers and torch packages. Install them using:

pip install 'benchwise[transformers]'
# or
pip install transformers torch

Signature​

Parameters​

Returns​

Usage​

In Evaluations​

Installation​

See Also​