From a Logfile to a Histogram With a Few Lines of R
I've been helping a client identify some performance issues with a new hosting platform they're in the process of commissioning.
The new platform has New Relic running but unfortunately it only provides an average for response times. Averages can hide all manner of sins so I prefer to look at the distribution of response times, I also wanted a way to compare against the existing platform which has no monitoring on it.
The method I chose was to add time taken to the IIS logfiles and plot histograms using R.
(Time taken includes network time which may be an issue in some scenarios)
R is a tool for statistical computing that makes crunching numbers and turning them into graphs relatively easy. When I first started using R I found it had a bit of a learning curve and I still have to work had to do anything that's not trivial but that's probably a mixture of all the statistical knowledge I've forgotten and the language / libraries.
IIS logfiles start with a multi-line header (sometimes they can have one part way through too!) that looks like this