Skip to main content

Bot Detection Configuration

The [bot] section configures how PRISM identifies bot user agents. In bot-only mode, only requests from detected bots are rendered; all other requests are proxied directly to the origin.

TOML Example

[bot]
patterns = [
# Search engines
"Googlebot",
"Googlebot-Image",
"Googlebot-Video",
"Googlebot-News",
"Storebot-Google",
"Google-InspectionTool",
"GoogleOther",
"bingbot",
"Baiduspider",
"YandexBot",
"DuckDuckBot",
"DuckAssistBot",
"Slurp",
"Applebot",
"Applebot-Extended",
"PetalBot",
"Sogou",
"SeznamBot",
"Amazonbot",
"Bravebot",
# AI/LLM crawlers
"GPTBot",
"ChatGPT-User",
"OAI-SearchBot",
"ClaudeBot",
"Claude-User",
"Claude-SearchBot",
"anthropic-ai",
"PerplexityBot",
"Bytespider",
"meta-externalagent",
"Meta-ExternalFetcher",
"FacebookBot",
"CCBot",
"DeepSeekBot",
"cohere-ai",
"Diffbot",
"YouBot",
"PhindBot",
"FirecrawlAgent",
"Timpibot",
"ImagesiftBot",
# Social / link preview
"facebookexternalhit",
"Twitterbot",
"LinkedInBot",
"Pinterestbot",
"Discordbot",
"WhatsApp",
"TelegramBot",
"Slackbot",
"redditbot",
"Snap URL Preview",
"Bluesky",
"Mastodon",
"Viber",
"kakaotalk-scrap",
"Iframely",
"FlipboardProxy",
# SEO tools
"AhrefsBot",
"SemrushBot",
"MJ12bot",
"DotBot",
"DataForSeoBot",
"ContentKingApp",
"Screaming Frog",
"Embedly",
"Quora Link Preview",
# Archive
"ia_archiver",
]

Parameters

ParameterTypeDefaultDescription
patternsArray of Strings(60+ patterns, see above)User-agent substrings that identify bot traffic

Detection Strategy

PRISM uses a two-layer bot detection approach:

  1. Primary: isbot crate -- PRISM first checks the User-Agent header against the isbot library, which maintains a comprehensive, regularly updated database of known bot signatures.

  2. Fallback: patterns list -- If isbot does not match, PRISM checks whether the User-Agent contains any of the configured pattern strings as a substring. Matching is case-insensitive.

This dual approach ensures reliable detection even for new or niche crawlers not yet in the isbot database.

Detailed Explanation

Pattern matching

Each entry in the patterns array is treated as a case-insensitive substring match against the full User-Agent header. For example, the pattern "Googlebot" matches:

  • Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
  • Googlebot-Image/1.0
  • Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1)

Default pattern categories

The default list covers four major categories:

  • Search engines (20 patterns): Google, Bing, Baidu, Yandex, DuckDuckGo, Apple, and regional search engines
  • AI/LLM crawlers (21 patterns): GPTBot, ClaudeBot, PerplexityBot, DeepSeekBot, and other AI training/search crawlers
  • Social/link preview (16 patterns): Facebook, Twitter/X, LinkedIn, Discord, WhatsApp, Telegram, Slack, Reddit, Bluesky, Mastodon
  • SEO tools and archives (10 patterns): Ahrefs, Semrush, Screaming Frog, Internet Archive

Overriding the default list

Setting patterns in your config replaces the entire default list. If you only want to add a custom bot, you must include the defaults plus your additions:

[bot]
patterns = [
"Googlebot",
"bingbot",
# ... include defaults you need ...
"MyCustomCrawler",
]

Example Use Cases

Minimal bot list for testing

[bot]
patterns = ["Googlebot", "bingbot"]

Adding a custom internal crawler

[bot]
patterns = [
# Keep all defaults plus your custom bot
"Googlebot",
"bingbot",
"Baiduspider",
"YandexBot",
# ... other defaults ...
"InternalMonitorBot",
"MyCompanyCrawler",
]

Rendering for all traffic (no bot detection needed)

If you use render-all mode, the bot patterns list is not consulted for routing decisions, but it is still used for analytics and the X-Prism-Bot response header.

[server]
mode = "render-all"

# Bot patterns are still used for tagging, not routing
[bot]
patterns = ["Googlebot", "bingbot"]