overrideHost www.example.com demo-proxy.asteno.workers.dev
navigate https://example.com/test-page.html
I start with a simple boilerplate worker and as the transforms tend to be bespoke for each site, I create a separate worker for each site I'm testing.
The boilerplate script for the worker follows this pattern:
serves a robots.txt that disallows crawlers
returns an error if the x-host
header is missing
if the request is for a predefined site, the browser is expecting a HTML response and the x-bypass-transform
header isn't set to true
the proxy uses a HTMLRewriter to modify the response
Otherwise just proxy the request
/* Started from Pat's example in https://www.slideshare.net/patrickmeenan/getting-the-most-out-of-webpagetest */
/*
* TODO
* Add mimetype to robots.txt
* Add a better doc check, perhaps use a header instead?
*/
const site = 'www.example.com' ;
addEventListener ( 'fetch' , event => {
event . respondWith ( handleRequest ( event . request ))
});
async function handleRequest ( request ) {
const url = new URL ( request . url );
// Disallow crawlers
if ( url . pathname === "/robots.txt" ) {
return new Response ( 'User-agent: *\nDisallow: /' , { status : 200 });
}
// When overrideHost is used in a script, WPT sets x-host to original host i.e. site we want to proxy
const host = request . headers . get ( 'x-host' );
// Error if x-host header missing
if ( ! host ) {
return new Response ( 'x-host header missing' , { status : 403 });
}
url . hostname = host ;
const bypassTransform = request . headers . get ( 'x-bypass-transform' );
const acceptHeader = request . headers . get ( 'accept' );
// If it's the original document, and we don't want to bypass the rewrite of HTML
// TODO will also select sub-documents e.g. iframes, from the same site :-(
if ( host === site &&
( acceptHeader && acceptHeader . indexOf ( 'text/html' ) >= 0 ) &&
( ! bypassTransform || ( bypassTransform && bypassTransform . indexOf ( 'true' ) === - 1 ))) {
const response = await fetch ( url . toString (), request )
return new HTMLRewriter ()
. on ( 'selector' , new exampleElementHandler ())
. transform ( response )
}
// Otherwise just proxy the request
return fetch ( url . toString (), request )
}
/*
*
*/
class exampleElementHandler {
element ( element ) {
// Do something
}
}
Example Transforms
The transforms I'm using are fairly straightforward and mainly consist of unsharding domains, changing the order of the page, or delaying when a resource loads.
Sometimes it's possible to manipulate an existing element in the page, sometimes an element has to be deleted and a replacement inserted elsewhere in the page.
Requesting frameworks, libraries etc from 3rd-party CDNs such as cdnjs, jsdelivr etc. is still very common across many of the customers I work with.
Requesting these from another origin involves creating a new connection, and then as HTTP/2 prioritisation only works across a single connection they may compete for the network with other resources.
One of the first tests I try is directing these requests through the proxy, so they're on the same origin as the page too:
overrideHost www . example . com demo - proxy . asteno . workers . dev
overrideHost ajax . googleapis . com demo - proxy . asteno . workers . dev
navigate https : //example.com/test-page.html)
The proxy could be improved to cache these libraries on Cloudflare to remove the request origin for them – one of Pat Meenan's workers has an example of how to do this.
Clients often use 3rd-party services that don't need to be loaded until the visitor has a usable page – sometimes these provide outward facing features such as chat or feedback widgets, other times they may be internal facing, session replay for example.
I'll often defer the load for these types of services by moving them into a Tag Manager, and initiating their insertion using the Window.Loaded
trigger in Google Tag Manager (GTM).
In one recent example, HotJar was loaded via an async snippet at the start of the head:
( function ( h , o , t , j , a , r ){
h . hj = h . hj || function (){( h . hj . q = h . hj . q || []). push ( arguments )};
h . _hjSettings = { hjid : xxxxxx , hjsv : x };
a = o . getElementsByTagName ( 'head' )[ 0 ];
r = o . createElement ( 'script' ); r . async = 1 ;
r . src = t + h . _hjSettings . hjid + j + h . _hjSettings . hjsv ;
a . appendChild ( r );
})( window , document , 'https://static.hotjar.com/c/hotjar-' , '.js?sv=' );
To delay HotJar loading and simulate it being implemented via GTM I wrapped the HotJar snippet with a native event handler for window onload.
class deferInlineScript {
element ( element ) {
const wrapperStart = "window.addEventListener('load', function() {" ;
const wrapperEnd = "});" ;
element . prepend ( wrapperStart , { html : true });
element . append ( wrapperEnd , { html : true });
}
}
Qubit's SmartServe is quite a large tag and even when loaded async competes for network bandwidth and CPU time in ways that impact performance.
One site I tested implemented the SmartServe tag near the top of the <head>, before any stylesheets.
<script src= '//static.goqubit.com/smartserve-xxxx.js' async defer ></script>
Its fetch was initiated soon after the page started loading and was competing with higher priority render blocking resources so I wanted to move the element to much later in the <head>.
This type of change becomes a two stage process where one handler removes the script element and then a second reinserts it (just before the end of the head).
. on ( 'script[src="//static.goqubit.com/smartserve-xxxx.js"]' , new removeSmartServe ())
. on ( 'head' , new reinsertSmartServe ())
class removeSmartServe {
element ( element ) {
element . remove ();
}
}
class reinsertSmartServe {
element ( element ) {
var text = '<script src="//static.goqubit.com/smartserve-xxxx.js" async defer></script>' ;
element . append ( text , { html : true });
}
}
Testing
In initial testing I tend to start with host overrides in WebPageTest, then switch to curl or a browser when developing the HTML rewriting script, and finally switching back to WebPageTest to check before and after comparisons.
It's also an iterative process where I'll make a some initial changes, test and refine until I'm happy with their impact and then start around the loop again.
To test the HTML rewriting using curl
both the x-host
, and accept
headers need to be set appropriately.
curl -H "x-host: www.example.com" -H "accept: text/html" https://demo-proxy.asteno.workers.dev/test-page.html
Piping curl's output to a file or util like less
makes it easier to read.
For in-browser testing of HTML rewriting I've been using Chrome, setting the x-host
header with the ModHeader Extension and then loading the page via the proxy i.e. https://demo-proxy.asteno.workers.dev/test-page.html
This approach only allows the initial host to be overridden, so can't be used to unshard domains.
Finally when I'm happy with the host overrides and HTML rewrites I switch back to WebPageTest and generate before (baseline) and after tests.
I've found that some sites get faster when proxied through Cloudflare's network, so I still used the proxy when I'm generating a baseline for comparison but set the x-bypass-transform
header to true so the HTML transforms aren't applied.
setHeader x-bypass-transform: true
Gotchas
A few issues have tripped me up while I was writing and testing proxies:
overrideHost and Service Workers
WebPageTest's overrideHost
command doesn't seem to work with requests dispatched from a Service Worker and the request always seems to default back to the original host.
Reading the code and talking to Pat, it appears it should but I've not had time to debug this issue further yet.
overrideHost and non-Chromium browsers
I could only get overrideHost
to work in Chromium based browsers – Chrome, Mobile Chrome and Edge.
When rewriting the HTML, I sometimes have to rely on fragile DOM queries, for example this selector to target the first script element in the head: head > script:nth-of-type(1)
.
And as there's currently no way to extract the contents of an element I can't test that the element that's been passed to the handler is the one I wanted to target.
Specific selectors for example, that use an id, or src attribute etc., are more robust.
The DOM that HTMLRewriter is operating on is not the same DOM as viewed in the Elements tab in DevTools as the rewriter doesn't execute scripts, so by default the DOM queries can't be tested in the browser.
Using DevTools to block all requests except the one for the source HTML document and then checking the queries from the console is one way around this.
Closing Thoughts
Even though I've only used the combination WebPageTest and Cloudflare Workers with a few sites, it's clear that it's a powerful combination and it's likely to become a regular part of my client workflow.
At BrightonSEO I'm talking about Reducing the Speed Impact of Third-Party Tags and as much as I can talk about the theory, nothing beats a good demo.
For my demo I used a worker to re-write parts of the page and choreograph how 3rd-party tags were loaded. The changes improved Largest Contentful Paint by a second for OPI's product page (top row).
The filmstrip is for an uncached view of the page, and although there's still plenty of room for improvement in the initial render time, it illustrates how a proxy can be used to quickly evaluate changes before committing them to the development lifecycle.
There's plenty of other optimisations to try… from replacing an embedded YouTube player with a lazy-loaded version or adding the lazy-loading attribute to out of viewport images, through to using Cloudflare's image optimisation, and text compression features to reduce payload sizes.
A few clients ask me to evaluate the performance impact of 3rd-party tags before they implement them. As part of this process I typically query the HTTP Archive to find another site that uses the same tag and then test that site with and without the tag. Using a proxy I could inject the tag into the client's site and see what impact it has.
As yet, I've not got as far as rewriting or replacing external scripts and stylesheets, or exploring how Cloudflare's cache and key-value store can be used in the testing process.
But if you'd like some more sophisticated examples of the types of optimisations that can be implemented using Cloudflare's Workers, Pat Meenan has a collection of examples on GitHub .
Further Reading
Prototyping optimizations with Cloudflare Workers and WebPageTest , Andrew Galloni , Dec 2019
Pat Meenan's collection of Cloudflare Workers
Cloudflare Workers documentation