Error Handling

Handle errors gracefully in evaluations.

Automatic Error Handling

Understand how Benchwise automatically manages and reports errors during evaluations.

Benchwise automatically handles errors in evaluations:

import asyncio
from benchwise import evaluate

@evaluate("gpt-4", "invalid-model")
async def my_test(model, dataset):
    responses = await model.generate(dataset.prompts)
    return {"responses": responses}

results = asyncio.run(my_test(dataset))

# Check for failures
for result in results:
    if not result.success:
        print(f"Error in {result.model_name}: {result.error}")

Custom Error Handling

Implement custom error handling logic within your evaluation functions.

@evaluate("gpt-4")
async def robust_test(model, dataset):
    try:
        responses = await model.generate(dataset.prompts)
        return {"responses": responses}
    except Exception as e:
        # Custom error handling
        return {"error": str(e), "partial_results": None}

Retry Logic

Strategies for implementing retry mechanisms for robust evaluations.

import asyncio

@evaluate("gpt-4")
async def test_with_retry(model, dataset):
    max_retries = 3

    for attempt in range(max_retries):
        try:
            responses = await model.generate(dataset.prompts)
            return {"responses": responses}
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            await asyncio.sleep(2 ** attempt)  # Exponential backoff

Custom Exceptions

Learn about Benchwise's custom exception classes for more granular error management.

from benchwise.exceptions import BenchwiseError, ModelError, DatasetError

try:
    dataset = load_dataset("invalid.json")
except DatasetError as e:
    print(f"Dataset error: {e}")

try:
    responses = await model.generate(prompts)
except ModelError as e:
    print(f"Model error: {e}")

Automatic Error Handling​

Custom Error Handling​

Retry Logic​

Custom Exceptions​

See Also​

Automatic Error Handling

Custom Error Handling

Retry Logic

Custom Exceptions

See Also