Tagged: Web Performance

Warming up for Velocity 2013 in Santa Clara

I have attended several conferences in the last few years. The first one that really changed my “developer life” was the Velocity 2011 conference in Santa Clara. I have always been interested in optimizing and being diligent about the web, however, my learning during those three days in Santa Clara has influenced my every day life and the way I see performance.

I truly admire each and every speaker and attendee at the conference because they all share the same passion: Optimizing the web and making performance count. I am honoured to announce that it is my turn this year to give back to that same community of people and share what I have learned over the last few years and what I have been applying at Canada’s public broadcaster, the CBC.

Our talk “The Canadian Public Broadcaster On A Diet: Slimming Down For A Whole Nation“ will focus on (mobile) web fitness and how to “slim down”. Tips and tricks will be shared about how to stay in shape when developing (mobile) sites for millions of people.

My talented co-worker Blake and I will be talking about how we apply performance optimization at the CBC, one of Canada’s largest web properties with over 5 million pages. As a publicly funded organization, all Canadian eyes are on us making sure we stay on budget and deliver quality and optimized content to users.

While Blake will be talking more about the backend, server and CDN aspects of performance optimization and tips, I will be sharing information about how we optimize and tweak performance from a frontend development and automated deployment perspective, basically – how to get and stay in shape.

Don’t worry; this definitely will not be your typical boring and horrifying boot-camp experience! Our talk will utilize fun and catchy analogies to explain the weight and performance of pages. I will be your honest CBC “fitness trainer”, telling the audience about the page weight of our sites on multiple platforms, how we measure performance and set budgets. However, putting our content on a scale will tell the truth: a content breakdown of our pages will help the audience understand how content is structured and where we can “slim down”, but also where a fitness routine cannot help.

Keep us company while we share some insights about setting up our own HTTP Archive instance as a tool – or how I would describe it: the BMI of web sites – to compare our own weight to the public HTTP Archive instance. We will share some queries from our HTTP Archive database to help identify bottlenecks, and we will tell you about how we discovered problems with some unnecessary weight that we thought we didn’t have.

Additionally, sweet and dangerous temptations will be placed in front of your eyes, the kinds that we all have to deal with when creating high traffic sites, including, 3rd party scripts that could significantly harm the performance of our sites when not properly implemented. We compare client-side versus server-side 3rd party implementation. We will also reveal the amount of improvement we saw in load time once we turned off all ads on our mobile touch site for a weekend.

During our talk, you will also hear about our fitness stack regarding how we monitor our fitness level, and why it is so important to stay on a strict exercise schedule and avoid gaining too much unwanted weight, which can happen without even knowing it. If you want to exercise and stay in shape, there are tons of great tools out there to help you achieve that. We will cover how we organize and optimize our sites, our releases and deployment and how easily you can include tools in your deployment process to automate performance optimization.

If you want to know how we use RUM in combination with synthetic testing, and what our RUM numbers reveal, then you shouldn’t miss out on our talk.

Lastly, we will explain the challenges that we have faced, as the national news broadcaster in a world of ever changing news, with the potential for a breaking news story at any moment, that could drive our traffic to the roof, and how we need to respond to that.

Come join our talk and if you like, wear your favorite running shoes because you never know, you might want to start exercising right after.

We look forward to meeting you all!

More details to our scheduled talk and location: http://velocityconf.com/velocity2013/public/schedule/detail/27973

Advertisements

Setup your own HTTP Archive to track and query your site trends

Be honest, ever wanted to play “Steve Souders” for a day and pull some cool stats or trends about some web sites of your choice? Well, how about setting up your own HTTP Archive then?

Httparchive.org is an excellent tool to track, monitor and review how the web is built. You can dig into trends around page size, page load time, CDN usage, distribution of different mimetypes and many other stats.

You can download an HTTP Archive MySQL dump and the source code from the download page and play around yourself with the current data. For example, do what Stoyan Stevanov did by asking yourself some questions: “Hm, I wonder what are common mime types these days”. Once you’ve setup the database, you can easily query anything you want.

However, what I personally find the most intriguing and fun is applying all of this to sites of of your choice. Alright, let’s break this down: if you’re famous and your site is listed under the Top * Alexa sites, then you can use the official dump, if your target site is a wee bit less famous and not part of any of the crawled sites, you might want to start using your own database and local instance of HTTP Archive. That way, you can run this handy tool on any of the web sites you want to test.

Things to consider before you get started

You need MySQL, PHP and your own webserver running. If you choose to run your own private instance of WebPagetest, you won’t have to request an API key. I decided to ask Patrick Meenan (pmeenanATwebpagetestDOTorg) for an API key with limited query access. That’suffcient for me for now, if I ever wanted to use more WebPagetest runs per day, I’d probably want to setup a private instance of WebPagetest. I’ve done this before but my computer had to be replaced, and I haven’t had the time after to set this up again.

Sample setup

bulktest: That’s the folder you really want to understand and work with when setting up your own little HTTP Archive baby.

  • bulktest/README.txt: This file gives you a general intro on how to use the folder, I recommend to read this.
  • bulktest/bootstrap.inc: In case you choose to us a private API key for WebPagetest, you will need to update this file with the provided key

To run a nice little batch, you want to execute the following scripts after each other via CLI (default setup for security reason)

  • bulktest/batch_start.php: This script takes pre-defined list of URLs (importurls.php) that you can specify or change. By default it’s downloading the latest  Alexa list (downloadAlexList()) and imports those into the urls table. I’ve changed this so it’s picking up my own csv file with the urls I want to crawl but you can also customize this the way you want it (default setup needs to run via CLI)
  • bulktest/batch_process.php: Run this as often after each other until you get confirmation that your runs were successfully recorded (default setup needs to run via CLI)

Batch summaryIt always gives you a nice batch summary at the end so you know where you are at (see screenshot)

More detailed steps on how to install HTTP Archive can be found under the blog post Setting up HttpArchive private instance. As suggested by the README.txt file and this blog post, it’s probably useful if you setup cronjobs in your environment to automate the batch steps.

Front-end piece: Visualizing your trends and stats by filling the charts

Congrats! Assuming you’ve successfully setup your own HTTP Archive instance  – wasn’t that fun and the bit of pain worthwhile? Now you can start viewing those charts and investigating trends and stats targeted to your defined URLs. The beautiful thing about having your own instance is that you can be your own master of data visualization: you can now create additional charts beside the ones that came out of the box provided by the default HTTP Archive setup.

And if you don’t like Google chart tools, you may even want to check out d3 or Highcharts to use instead.

From now on, the sky is the limit. Nobody can stop you now, my friend – you can even run some kick-ass raw database queries if you don’t really care much about the front-end visualization (I do ;))

Back-end piece: Querying the database directly

Sometimes, you want to get some questions answered without creating a pie or a chart. That’s when you can make use of the MySQL tables directly that have been setup for you (via schema sql file and filed by your batches).

Let’s run a simple query on the requests table.

For example, some of our sites use YUI, some use JQuery – but we would really like to avoid having pages serve both.

A simple sample query like the one below could help identify those sites:

select req_referer from requests where url like '%/i/l/yui%' or url like '%jquery-%.js' group by req_referer

Be prepared, some setup time is required

I’m not going to lie, it took me some after-work evenings, many debug statements via PHP to set everything up so I could run proper batch_start and a proper batch_process commands and fill in those pies and trends. Here are a few things to watch out for

  • I hadn’t installed pcntl with my PHP version, so I needed to set this up first. You will need this to run batch_process.php. Installing pcntl for php on osx Lion blog post helped a lot – Thank you Jacob!
  • You might have to adjust some of the tables values, I got lots of mysql insert/update errors regarding default values not set for certain fields
  • After I received my API key, I still had to change the WebPageTest URL after Patrick pointed me to the correct URL (Thanks again). The $gWPTUrl variable was set to http://httparchive.webpagetest.org as default instead of http://webpagetest.org in settings.inc
  • Some of the tables require you to have a temp/dev version of it as well, e.g. requests, urls, statsdev etc. Some of the scripts look for e.g. requestsdev. You might have to copy a few of the original tables during setup process. You can setup the naming convention in dbapi.inc

Next Steps

Where am I at? Well, I just finished a few successful batch_processes on a selected number of URLs. It’s fun to monitor trends based on the URLs I put in. I will be collecting more ideas and uses cases over the next few months, possibly also adding some more charts applicable to my needs.

If time permits, I hope to be sharing more of my customization and use cases of HTTP Archive at this year’s Velocity conference in San Jose. And if not there, I’ll try to update this blog post as best as possible. If you have any questions or suggestions, please leave a comment below.

I am always happy to receive feedback or happy help out with some roadblocks while setting this all up.

Thanks

to….

Advanced Web Performance Techniques – Part 1

Happy New Year, Everyone!

Web performance is something dear to my heart and is something that can make me browse the web for hours.

My favorite holiday treat is @perfplanet‘s performance calender, every year!

  • Who doesn’t like to improve things?
  • Who doesn’t like to make things more efficient and faster?
  • Who doesn’t want Steve, Stoyan or Ilya give us that one last great webperf thought before the year ends?

Today, I’d like to write about a few techniques I’ve read about in the last few months that I consider advanced and experimental in regards to web performance.

That being said, today’s post is not about the common principles on how to make websites faster, posted by Yahoo! several years ago. While those are still valid and should be followed by every web developer and organization on this planet, I want to focus on those not so commonly known principles and techniques on how to enhance the performance of your mobile web app/site (at least the ones that still give me the “awwww – I didn’t know, wow!”). Mostly principles that can be applied to your server or CDN.

Techniques pushed by Google

I love Google’s “Make the Web Faster” site.

I only recently discovered the (experimental) features of mod_pagespeed, the open source Apache HTTP server module that automatically applies web performance best practices. Instead of making web devs do the performance work, let the web server do that job by applying filters that enhance performance.

mod_pagespeed filter: Canonicalize JavaScript

When developing (mobile) websites, developers tend to include commonly used libraries, JQuery being one of the top ones. So, imagine you browse from website A using JQuery to website B also using JQuery. The browser automatically fetches JQuery again from domain B, although it was just requested from domain A.  A waste of bandwidth, something seems inefficient here, doesn’t it? So, the great minds from Google came up with a solution to reduce the obvious inefficiency by introducing the canonicalized JavaScript filter: The idea is that you re-use commonly used libraries (e.g. JQuery) on the web by first replacing it with the equivalent library hosted on Google’s CDN and then when browsing e.g website B, use the copy of JQuery that was previously fetched from the shared library (Google’s CDN).

More about that in this blog post or the direct link.

mod_pagespeed filter: Combine JavaScript (Experimental)

Everyone who cares about web performance knows that reducing HTTP requests is one method to enhance performance. How about your web server doing this job for you? It reduces round-trips to the server and also reduces latency issues. The combine_javascript filter concatenates all used JavaScript files on one’s page and replaces it with a single one.

Check out their demo page.

More fun mod_pagespeed filters here.

Delta Delivery

This is a nice one too. I found the idea and presentation at one of the W3C Performance Working Group meetings (excellent presentations).

How often does a web developer change the entire core of their JavaScript framework or libraries? Not that often, right? So on average, one might only add another feature to the JavaScript core files that visitors need to fetch while re-visting the website. The proposed delta technique aims to reduce the size of JavaScript and stylesheets by sending only the difference “between what the client has (in cache or local storage) and the latest version”. The benefit here is that the delta will, most of the time, be smaller than the original source. “Server computes and returns encoded delta between version X and Y which is much smaller than Y itself”.

More information about “Browser Enhancements to Help  Improve Page Load Performance  Using Delta Delivery

Others & General

It’s not only Google who tries to be smart about predicting users behaviors and trying to enhance speed performance based on situational conditions.

  • Akamai has always been on the forefront of optimization and delivering content. Their situational performance techniques are fun and easy to follow. Aqua ION focuses on front-end optimization.
  • Also Cloudflare suggested their own technique called Rocket Loader to combine JavaScript and Stylesheets more efficiently.
  • Guy’s situational performance post on perfplanet’s calendar 2012.

And some client-side techniques that I don’t want to forget to mention:

This is by no means everything innovative and great that is out there to make your site faster. I appended “Part 1” to the title of this post because I know I will soon find more great techniques that will be worth mentioned. So please stay tuned.

And of course, feel free to share any other exceptional tricks worth mentioning.

Building and designing energy-efficient mobile websites

I was introduced to an excellent paper at the Future Insights Live in Las Vegas this year during a talk I attended and really liked by @glan.

The paper is called “Who Killed My Battery: Analyzing Mobile Browser Energy Consumption” and was presented at WWW 2012, the Mobile Web Performance Conference, France, in April 2012.

For anybody who is passionate about & interested in mobile performance and wants to know what actually kills the battery life on a mobile device when browsing the web, this is a must-read!

A few take-aways

  • Many sites are poorly optimized for energy use and rendering them in the browser takes more power than necessary.
  • Rendering images takes a significant fraction of the total rendering energy.
  • Some sites like Youtube spend around 1/4 of their rendering energy on images.
  • Javascript is one of the most energy consuming components in a web page.
  • Using generic Javascript libraries simplifies web development, but increases the energy used by the resulting pages.
  • Rendering JPEG images is considerably cheaper than other formats such as GIF and PNG for comparable size images
  • …more below in slideshare presentation

If you don’t have time and rather want a summary, you can flip through the slideshare presentation I created for an internal working group meeting at my current company.

Link to paper: http://www2012.org/proceedings/proceedings/p41.pdf