With our current tools it’s relatively easy to examine the performance of a single page, or the performance of a journey a visitor takes through a series of pages but when I examine a client’s site for the first time I often want to get a broad view of performance across the whole site.
Sometimes I want more than these tools offer - I might want to test from the USA or Japan, or want some measurements they don’t provide - that’s when I use my own instance of the HTTP Archive.
I run a customised version of the HTTP Archive using my own instance of WebPageTest (WPT) using test agents at various locations.
Getting the HTTP Archive up and running is a bit fiddly but not too hard.
I didn’t make any notes when I got my own instance up and running but Barbara Bermes wrote a pretty good guide - Setup your own HTTP Archive to track and query your site trends
My own instance is slightly different from the ‘out of the box’ version:
The batch process that submits jobs to WebPageTest, monitors them and then collects the results is split into two separate processes:
- Submit the tests (I monitor the jobs via WebPageTest until they’ve completed)
- Collects the results for completed tests, parses the results and inserts into DB
I’ve also introduced some new tables, one which maps WPT locations to friendly names, and another which groups URLs to be tested so that I can test subsets of pages, a page across multiple locations, multiple browsers etc.
These changes will be open sourced at some point later this year (some of the changes were part of a client engagement and they’ve agreed they can be released - probably in the Autumn)
Exploring a site
To gather the URLs to be tested I often crawl a site with sitespeed.io or another crawler before inserting the URLs manually into the HTTP Archive DB (spot the automation opportunity).
Once the URLs are in the DB, I schedule the tests with WPT, and collect the data when the tests complete.
Although I use the HTTP Archive for data collection and storage, I don’t actually use the web interface to examine the data.
Small images that might be suitable for techniques like spriting, or larger images that should be optimised further are really easy to identify with simple SQL queries, such as: