Skip to main content

Content Validation

PRISM includes a content quality gate that validates rendered HTML before caching and serving it. This prevents broken or empty pages from being indexed by search engines.

Configuration

[render.content_validation]
enabled = true
min_text_length = 100
require_title = true
min_html_bytes = 1024

Validation Checks

min_html_bytes

The total size of the rendered HTML in bytes. If the output is smaller than this threshold, validation fails. This catches cases where Chrome returned an error page or the SPA failed to render.

Default: 1024 bytes

min_text_length

The number of visible text characters after stripping all HTML tags. This ensures the page has meaningful content, not just an empty shell with markup.

Default: 100 characters

require_title

When enabled, PRISM checks that the rendered HTML contains a non-empty <title> tag. Pages without titles are a strong signal that the SPA did not finish rendering.

Default: true

What Happens on Failure

When content validation fails:

  1. The rendered HTML is discarded (not cached, not served).
  2. The request falls back to the origin response -- the unrendered SPA HTML is proxied directly to the client.
  3. The circuit breaker is not tripped. Content validation failures are treated as soft failures because the Chrome render itself succeeded; the output simply did not meet quality standards.
  4. A warning is logged with the failure reason.
  5. The content_validation_failures metric counter is incremented.

This design ensures that a misconfigured SPA or a temporary data-loading failure does not cause PRISM to stop rendering entirely.

When to Enable

Enable content validation (the default) when:

  • You are running in bot-only mode and want to ensure bots always see quality content.
  • Your SPA occasionally fails to load data, resulting in empty shells.
  • You want to catch rendering regressions before they affect SEO.

When to Disable

Disable content validation when:

  • You are running in render-all mode and serving all visitors. In this mode, you may prefer to serve whatever Chrome rendered rather than falling back to the unrendered SPA.
  • Your pages legitimately have very little text content (e.g., image galleries, video pages).
  • You have pages without <title> tags by design.
[render.content_validation]
enabled = false

Tuning Thresholds

For SPAs with short pages, lower the thresholds:

[render.content_validation]
enabled = true
min_text_length = 50 # Short product descriptions
require_title = true
min_html_bytes = 512 # Smaller pages

For content-heavy sites, raise them:

[render.content_validation]
enabled = true
min_text_length = 500 # Ensure full article body rendered
require_title = true
min_html_bytes = 4096 # Larger pages expected