B Blengi docs

Run your workspace

System health & diagnostics

Every workspace admin gets a self-serve diagnostic at /app/system-health. The page is read-only โ€” it does not call any external API and cannot mutate state โ€” and it surfaces the four config gaps that account for ~90% of "the bot isn't working" support tickets:

  1. LLM provider โ€” resolves the bound OpenAiClient. Green when Workers AI / OpenAI / OpenRouter is configured. Red when the deterministic FakeOpenAi is in play (no real provider).
  2. Cloudflare creds โ€” checks CLOUDFLARE_ACCOUNT_ID + CLOUDFLARE_API_TOKEN. Amber when only one is present; green when both are.
  3. Embedding fallback โ€” checks OPENAI_API_KEY. Amber when missing: the install works on the happy path but the next Workers AI rate-limit will halt indexing.
  4. Queue worker โ€” surfaces the queue driver, pending count, and failed-job count. Amber when queue is sync (fine for dev, broken for prod where SSE + crawl jobs need their own workers).

Source-level problems

Beneath the four cards is a list of every Source in the workspace whose status is failed or stuck on crawling, ordered by most recently updated. IndexDocumentJob stamps the exception class name into each row's error column, so the admin can tell a rate-limit ([OpenAiRateLimitException]) apart from a bad-request ([OpenAiBadRequestException]) apart from a Cloudflare 40040 (Vectorize provisioning lag).

Cloudflare Workers AI: non-Llama models

Workers AI's /v1/chat/completions endpoint is OpenAI- compatible in name, but model families diverge in two ways that affect the bot:

  • Streaming shape varies. Llama 3.3 emits the OpenAI SSE shape (data: {"choices":[{"delta":{"content":"..."}}]}). Other families (Mistral, some Gemma builds, Qwen variants) emit either NDJSON without the data: prefix or a CF native {"response":"..."} shape on the same endpoint. WorkersAiClient::streamChat() handles all three (since 2026-05-29) so any chat-capable Workers AI model streams tokens.
  • Not every Workers AI model has a chat endpoint. Embedding / classification / Whisper models silently return empty bodies on /v1/chat/completions. When the Settings โ†’ System chat probe surfaces "No content via streaming OR non-streaming", the configured CLOUDFLARE_CHAT_MODEL isn't a chat model โ€” switch to any @cf/meta/llama-* slug or a Mistral instruct variant.

The chat probe runs streaming first; if streaming yields zero tokens it retries non-streaming. Three outcomes:

  • Both work โ†’ green, surfaces the model's reply.
  • Streaming empty, non-streaming worked โ†’ red, points the operator at a streaming-capable slug. The bot still works in non-streaming fallback paths, but the visitor SSE path needs a streaming-capable model to be useful.
  • Both empty โ†’ red, the configured model has no OpenAI-compat chat endpoint.

Access control

Workspace owners + admins can load the page; viewers get 403. The sidebar entry under "Run your workspace" only appears for users who can see the route.