Skip to main content

HTML Postprocessing

After extracting the rendered HTML from Chrome, PRISM runs it through a series of transforms using lol_html -- Cloudflare's streaming HTML rewriter. This produces clean, lightweight HTML optimized for search engine consumption.

All transforms are individually configurable under [render.postprocess].

Configuration

[render.postprocess]
enabled = true
strip_scripts = true
strip_noscript = true
strip_comments = true
strip_event_handlers = true
strip_hydration_attrs = true
resolve_lazy_images = true

Transforms

Strip Scripts

Removes all <script> elements except those with type="application/ld+json" (structured data).

Before:

<head>
<script type="application/ld+json">{"@context":"https://schema.org","@type":"Product"}</script>
<script src="/app.bundle.js"></script>
<script>window.__STATE__ = {...}</script>
</head>

After:

<head>
<script type="application/ld+json">{"@context":"https://schema.org","@type":"Product"}</script>
</head>

JSON-LD is preserved because search engines use it for rich results and knowledge panels.

Strip Noscript

Removes all <noscript> elements. These contain fallback content for browsers without JavaScript, which is irrelevant in rendered output.

Before:

<noscript>
<p>You need to enable JavaScript to run this app.</p>
</noscript>
<div id="root">
<h1>Welcome</h1>
</div>

After:

<div id="root">
<h1>Welcome</h1>
</div>

Strip Comments

Removes all HTML comments, including framework build markers and conditional comments.

Before:

<!-- build: 2024-01-15T10:30:00Z -->
<!-- react-empty: 1 -->
<p>Content</p>
<!-- end content -->

After:

<p>Content</p>

Strip Event Handlers

Removes inline event handler attributes (onclick, onmouseover, onsubmit, etc.) from all elements. Bots do not execute JavaScript, so event handlers are dead weight.

Before:

<button onclick="addToCart()" onmouseover="highlight()">Add to Cart</button>
<form onsubmit="validate()">
<input onfocus="clearError()">
</form>

After:

<button>Add to Cart</button>
<form>
<input>
</form>

The full list of recognized event handlers includes: onclick, ondblclick, onmousedown, onmouseup, onmouseover, onmouseout, onmousemove, onmouseenter, onmouseleave, onkeydown, onkeyup, onkeypress, onfocus, onblur, onchange, onsubmit, onreset, onselect, onload, onunload, onresize, onscroll, onerror, oncontextmenu, drag events, touch events, pointer events, and onwheel.

Non-event attributes starting with on but not matching known events (e.g., data-on) are preserved.

Strip Hydration Attributes

Removes framework-specific hydration markers that are only needed for client-side re-hydration:

FrameworkAttributes Removed
Reactdata-reactroot, data-reactid, data-react-checksum, data-react-helmet
Vuedata-v-* (scoped CSS hashes), data-server-rendered
Angularng-*, _ngcontent-*, _nghost-*

Before (React):

<div data-reactroot="" data-reactid="1">
<h1 data-react-helmet="true">My Page</h1>
</div>

After:

<div>
<h1>My Page</h1>
</div>

Before (Vue):

<div data-v-abc123 data-server-rendered="true">
<span data-v-abc123>Hello</span>
</div>

After:

<div>
<span>Hello</span>
</div>

Before (Angular):

<app-root _nghost-abc="" ng-version="17.0">
<div _ngcontent-abc="">Content</div>
</app-root>

After:

<app-root>
<div>Content</div>
</app-root>

Regular data-* attributes (e.g., data-id, data-name) are preserved.

Resolve Lazy Images

Converts lazy-loaded images to eagerly-loaded ones so search engines can see all images. This transform applies to <img> and <source> elements and performs four operations:

1. data-src to src

Copies data-src to src and removes the data-src attribute. Relative URLs are resolved to absolute using the page's base URL.

Before:

<img data-src="/images/product.jpg" src="data:image/gif;base64,R0lGOD..." alt="Product">

After:

<img src="https://example.com/images/product.jpg" alt="Product">

2. data-srcset to srcset

Copies data-srcset to srcset and removes the data-srcset attribute.

Before:

<img data-srcset="/img/small.jpg 400w, /img/large.jpg 800w">

After:

<img srcset="/img/small.jpg 400w, /img/large.jpg 800w">

3. Remove loading="lazy"

Removes the loading="lazy" attribute so the browser fetches images immediately.

Before:

<img src="/photo.jpg" loading="lazy" alt="Photo">

After:

<img src="/photo.jpg" alt="Photo">

4. Remove Lazy CSS Classes

Strips known lazy-loading CSS classes from <img> and <source> elements: lazyload, lazy, unveil, lazyloaded, ls-blur-up-img. Other classes on the same element are preserved. If the class attribute becomes empty, it is removed entirely.

Before:

<img class="hero lazyload fade-in" src="/photo.jpg">

After:

<img class="hero fade-in" src="/photo.jpg">
note

Lazy class stripping is scoped to <img> and <source> tags only. A <div class="lazy"> is not affected.

Disabling Postprocessing

To disable all postprocessing (for example, in render-all mode where humans need interactive scripts):

[render.postprocess]
enabled = false

Or disable individual transforms:

[render.postprocess]
enabled = true
strip_scripts = false # Keep scripts for hydration
strip_hydration_attrs = false # Keep React/Vue markers
strip_event_handlers = false # Keep interactivity