Как обрабатывать ошибки LLM в продакшне?

Question

Никита Вихров · Answer

# Как обрабатывать ошибки LLM в продакшне

LLM API — внешний сервис. Он падает, тормозит, возвращает rate limit и иногда отвечает мусором. Без нормальной обработки ошибок твоё приложение падает вместе с ним.

---

**Какие ошибки бывают**

```python
from anthropic import (
    Anthropic,
    APIConnectionError,    # сеть недоступна
    APITimeoutError,       # запрос завис
    RateLimitError,        # превышен лимит запросов
    APIStatusError,        # HTTP ошибка от API (4xx, 5xx)
    AuthenticationError,   # неверный API ключ
)
```

---

**Базовая обработка с retry**

Большинство ошибок — временные. Rate limit и 503 проходят сами через несколько секунд:

```python
import time
import random
from anthropic import Anthropic, RateLimitError, APIStatusError, APIConnectionError

client = Anthropic()

def llm_call_with_retry(
    messages: list,
    model: str = "claude-opus-4-5",
    max_retries: int = 3,
    base_delay: float = 1.0
) -> str:

for attempt in range(max_retries):
        try:
            response = client.messages.create(
                model=model,
                max_tokens=1024,
                messages=messages
            )
            return response.content[0].text

except RateLimitError:
            if attempt == max_retries - 1:
                raise
            # Exponential backoff с jitter
            delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limit, жду {delay:.1f}с (попытка {attempt + 1}/{max_retries})")
            time.sleep(delay)

except APIStatusError as e:
            if e.status_code in (503, 529):  # перегрузка сервера
                if attempt == max_retries - 1:
                    raise
                delay = base_delay * (2 ** attempt)
                time.sleep(delay)
            else:
                raise  # 400, 401, 404 — не ретраим

except (APIConnectionError, APITimeoutError):
            if attempt == max_retries - 1:
                raise
            time.sleep(base_delay * (2 ** attempt))

raise RuntimeError("Все попытки исчерпаны")
```

---

**Fallback на более дешёвую модель**

Если основная модель недоступна — падай на резервную:

```python
MODELS_FALLBACK = [
    "claude-opus-4-5",
    "claude-sonnet-4-5",
    "claude-haiku-4-5",
]

def llm_call_with_fallback(messages: list) -> tuple[str, str]:
    """Возвращает (ответ, использованная_модель)"""

last_error = None

for model in MODELS_FALLBACK:
        try:
            response = client.messages.create(
                model=model,
                max_tokens=1024,
                messages=messages
            )
            return response.content[0].text, model

except Exception as e:
            print(f"Модель {model} недоступна: {e}")
            last_error = e
            continue

raise RuntimeError(f"Все модели недоступны. Последняя ошибка: {last_error}")

answer, used_model = llm_call_with_fallback(messages)
if used_model != "claude-opus-4-5":
    print(f"⚠️ Использована резервная модель: {used_model}")
```

---

**Таймаут — обязателен**

Без таймаута запрос может висеть минуты:

```python
import anthropic

# Таймаут на уровне клиента
client = anthropic.Anthropic(
    timeout=30.0  # 30 секунд на весь запрос
)

# Или точнее — разные таймауты на подключение и чтение
client = anthropic.Anthropic(
    timeout=anthropic.Timeout(
        connect=5.0,   # подключение
        read=25.0,     # ожидание ответа
        write=5.0,     # отправка данных
        pool=5.0       # ожидание соединения из пула
    )
)
```

---

**Логирование для отладки**

```python
import logging
import time

logger = logging.getLogger(__name__)

def llm_call_logged(messages: list, request_id: str) -> str:
    start = time.time()

try:
        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=1024,
            messages=messages
        )

duration = time.time() - start
        logger.info(
            "llm_call_success",
            extra={
                "request_id": request_id,
                "duration_ms": int(duration * 1000),
                "input_tokens": response.usage.input_tokens,
                "output_tokens": response.usage.output_tokens,
                "model": response.model,
            }
        )

return response.content[0].text

except Exception as e:
        duration = time.time() - start
        logger.error(
            "llm_call_failed",
            extra={
                "request_id": request_id,
                "duration_ms": int(duration * 1000),
                "error_type": type(e).__name__,
                "error_message": str(e),
            }
        )
        raise
```

---

**Минимальный продакшн-паттерн**

```python
def production_llm_call(messages: list, request_id: str) -> str:
    try:
        return llm_call_with_retry(messages)
    except RateLimitError:
        # Пользователю — дружелюбное сообщение
        raise ServiceUnavailableError("Сервис перегружен, попробуйте через минуту")
    except APIStatusError as e:
        logger.error(f"API error {e.status_code}", extra={"request_id": request_id})
        raise ServiceUnavailableError("Временная ошибка, попробуйте позже")
    except (APIConnectionError, APITimeoutError):
        raise ServiceUnavailableError("Не удалось подключиться к сервису")
```

Пользователь никогда не должен видеть сырые ошибки API.

---

На курсе [«ИИ для разработчиков»](https://ru.hexlet.io/programs/ai-for-developers) на Хекслете разбирают как строить надёжные LLM-приложения: retry-стратегии, мониторинг, graceful degradation в продакшне.

Как обрабатывать ошибки LLM в продакшне?

Ответы

Как обрабатывать ошибки LLM в продакшне

Каталог