Skip to main content

Error Handling

Handle errors gracefully in evaluations.

Automatic Error Handling

Understand how Benchwise automatically manages and reports errors during evaluations.

Benchwise automatically handles errors in evaluations:

import asyncio
from benchwise import evaluate

@evaluate("gpt-4", "invalid-model")
async def my_test(model, dataset):
responses = await model.generate(dataset.prompts)
return {"responses": responses}

results = asyncio.run(my_test(dataset))

# Check for failures
for result in results:
if not result.success:
print(f"Error in {result.model_name}: {result.error}")

Custom Error Handling

Implement custom error handling logic within your evaluation functions.

@evaluate("gpt-4")
async def robust_test(model, dataset):
try:
responses = await model.generate(dataset.prompts)
return {"responses": responses}
except Exception as e:
# Custom error handling
return {"error": str(e), "partial_results": None}

Retry Logic

Strategies for implementing retry mechanisms for robust evaluations.

import asyncio

@evaluate("gpt-4")
async def test_with_retry(model, dataset):
max_retries = 3

for attempt in range(max_retries):
try:
responses = await model.generate(dataset.prompts)
return {"responses": responses}
except Exception as e:
if attempt == max_retries - 1:
raise
await asyncio.sleep(2 ** attempt) # Exponential backoff

Custom Exceptions

Learn about Benchwise's custom exception classes for more granular error management.

from benchwise.exceptions import BenchwiseError, ModelError, DatasetError

try:
dataset = load_dataset("invalid.json")
except DatasetError as e:
print(f"Dataset error: {e}")

try:
responses = await model.generate(prompts)
except ModelError as e:
print(f"Model error: {e}")

See Also