Docs/Blocked topics and prompt injection

Blocked topics and prompt injection

Reject off-topic or abusive questions before they ever reach the language model — saving tokens and keeping conversations on-brand.

Two layers of blocking

Clarifier rejects two kinds of unwanted questions, both before any AI runs:

Prompt injection (always on) — Attempts to override the assistant's instructions — phrases like "ignore previous instructions", "you are now a", "show your prompt". A multilingual default list catches the common attacks in English, Danish, German, French, and Spanish. This protection is built in and can't be turned off.
Blocked topics (you configure) — Phrases you don't want the assistant to engage with — things like "medical advice", "investment recommendation", or names of competitors. Add them in the widget configuration; matching is case-insensitive substring match.

How matching works

Each blocked phrase is matched as a case-insensitive substring against the visitor's message. If any phrase is found anywhere in the message, the question is rejected. Use short, distinctive phrases — single common words will over-match. Questions longer than 500 characters are also rejected automatically as a basic abuse guard.

Why it saves money

Rejected questions never reach the language model. No retrieval, no LLM call, no tokens. The rejection is generated locally in milliseconds. On a busy site, blocking even a small fraction of off-topic questions adds up — every blocked message is a few cents you didn't spend.

Common patterns to block

What's worth blocking depends on your business, but these are common starting points:

Medical and health advice

If you're not a healthcare provider, blocking "medical advice", "diagnose", "prescription" keeps you out of regulated territory.

Legal advice

Block "legal advice", "sue", "lawsuit" if you're not a law firm — these answers carry real liability.

Financial recommendations

Block "investment advice", "should I buy", "financial advice" unless you're licensed for it.

Competitor questions

Block competitor names if you don't want the assistant comparing your product to theirs based on potentially outdated info.

What the visitor sees

When a question is rejected — by either layer — the visitor sees a short, polite message:

"I can only answer questions about this website's content. How can I help you with that?"

What blocking won't catch

Substring matching is simple by design. It won't catch creative paraphrases — "what should I take for a headache" gets through if you only blocked "medical advice". For high-stakes topics, combine blocked patterns with handoff keywords so the assistant escalates to a human instead of refusing outright.