Tagged: Facebook

Simulating Frontend SPOF – The day a tiny 3rd party script almost slowed down the entire Internet

How realistic is it really that a script that you didn’t even write could dramatically slow down your site and other major sites as well? Keep reading….scripts can slow down sites and it hurts to watch!

I watched the Fluent talk by Steve Souders from 2012 about High Performance Snippets (must-watch for all SPOF fans) and got inspired to test out how an “innocent” 3rd party script (btw. I call them 3rd party monsters), not loaded properly could result in a single point of failure (SPOF) and to make a site very slooooooow to load.

Developers are always proud and optimistic about their code, and when it comes to including 3rd party scripts, basically code they don’t usually touch, they assume those 3rd party providers like Google never go down, a non-responsive ad server like DoubleClick won’t hurt or Twitter won’t have server failures. 3rd party script developers try their best to make it easy and painless for us to include their high performing scripts into our sites. That’s a fact. True but also not true. They can only do so much. If you don’t properly include the script on your end and their service goes down, their high performing code won’t be able to help you at all. The rule of thumb is to include those scripts asynchronously. That way you make sure that your content won’t be blocked from rendering in case the 3rd party service is down.

However, scripts that use document.write can’t be loaded asynchronously (unfortunately). Read more about this in the great Krux post and some of Steve Souders’ posts.

It’s kind of like the elephant in the room to me; you pray e.g. that Twitter doesn’t go down meanwhile you are too afraid to test it out or are over-confident that it won’t break your page or you basically don’t really know how you would test this scenario in the first place. Am I right? Well, what if you could run a quick test on a web site and pretend all of their 3rd party scripts and providers were down. Let’s play the “3rd party scripts game“: would your web site still render…how confident are you?

Simulating SPOF – Slow down your own site until it really hurts

Are you ready for this? First, edit your hosts file to point to a blackhole IP address for simulation (I used the blackhole IP address Steve shared in his talk on slide 9).

sudo vi /etc/hosts

While setting up my test, I don’t want to play the really bad gal (yet) and assume all 3rd party providers were down. I’d like to start with the simplest but yet most used and harmful domain ads.doubleclick.net. A lot of web sites include ads and use DoubleClick.

So let’s use this domain for our blackhole test. By all means, you can add more 3rd party scripts to your hosts file.

// add this line to your hosts file
72.66.115.13 ads.doubleclick.net

Once you’ve updated your host file, remember to flush your DNS cache after.

dscacheutil -flushcache

Now, open your browser (with cache disabled so your browser is not using any DoublClick scripts from the cache). Type in your site’s URL and be prepared for the worst. How long will it take for the website to load?

That’s a very easy (scary) and quick way of evaluating what is on your critical rendering path and obviously (now) what should not be on it anymore!

I ran this test on our site and let me tell you, it hurt. Period. It took almost 1.5 min for cbc.ca to display useful content faking that DoubleClick was down. The browser finally gave up.

Aborting

I wasn’t ready to stop the game. I wondered if it’s just our site that doesn’t properly handle the outage of one single domain such as ads.doubleclick.net. So I continued and tried the following random websites and measured the time it took so see useful content on those.

URL Time past to see useful content
http://www.people.com ~4.5 mins
http://www.bbc.com ~2.5 mins
http://www.amazon.com Fine, didn’t seem to use DoubleClick
http://www.cnn.com Fine, they seemed to be doing the proper handling
http://www.facebook.com Surprise, surprise Facebook doesn’t use DoubleClick. They use their own, so no real delay here.

 

If you don’t want to edit your hosts file and want to get more concrete waterfall and timing information as well as video captioning, try out what Steve Souders suggested in his Fluent talk by using the scripts (now SPOF) box at webpagetest.org to include DNS changes. The results will give you great details on how the website performed, with and without SPOF.

SPOF doubleclick

Note: I’ve tried WebPagetest SPOF myself and didn’t notice a big difference between non-SPOF and SPOF version; my suspicion is that WebPagetest might not be using empty cache for SPOFs setting. The tests I ran manually on my local machine showed more visible negative impact of the SPOFs (I shall confirm this).

3rd party scripts are everywhere

It was verified last month that 18% of the world’s top 300K URLs load jQuery from Google hosted libraries. So that means in theory if that service goes down and a web site uses JQuery from ajax.googleapis.com (and doesn’t have a fallback), the site might not work at all. Isn’t that scary? If you develop for a web site that already uses a CDN, don’t use Google’s CDN for scripts like JQuery. Avoid those 3rd party dependencies as much as possible.

I ran two queries on my local HTTP Archive database (dump from March 2013) and followed the same filter that Steve Souders used above. I restricted the query to only look at 292, 297 distinct URLs from the March 1 2013 crawl (with their respective unique pageid’s). I wanted to see how many of the top 300K URLs use Twitter widgets and any sort of Facebook scripts (without a distinction if they were loaded synchronously or asynchronously).

Twitter

Twitter

13% of the Top 300K URLs include Twitter scripts somewhere on their page.

Facebook

Facebook

29% of the Top 300K URLs include Facebooks scripts somewhere on their page.

Feel free to extend this exercise to include more 3rd party domains.

Cached 3rd party scripts

You can’t really rely much on the cache settings of your 3rd party scripts to ignore their outage if it happens for less than a few hours. 3rd party providers tend to set a very low cache time on their scripts to make it flexible for them to change the file frequently.

That setup plays against you in the case where you don’t load 3rd party scripts asynchronously. For example Twitter’s widget.js has a cache time set to 30 mins (only). I wonder what change could be so important for Twitter that can’t wait for more than 30min to be loaded on sites consuming this widget.js file.

So imagine the following: You go to a site with the Twitter widget loaded synchronously at the top of the page (bad!) at 9 AM (getting the latest, freshest version of widget.js). Twitter goes down at 9:10 AM. You go back to the site you visited at 9 AM, now at 9:15 AM, everything is still fine, you won’t see any problems because you are getting the cached Twitter widget script from the browser cache. What if Twitter is still down at 9.40 AM and you visit the same page again, you now are past the cache modified time and your browser will request a new version of the Twitter script, trying to reach the Twitter server that is still down. You are now getting a time out response for the Twitter script that (with the setup described above) will block the page content from rendering. Bottom line, you wouldn’t be able to see any content until Twitter is back up (and the cache has expired). It’s easy to check those cache times yourself, e.g. use Chrome dev tools and check out the response headers from those 3rd party scripts.

The screenshot below shows Twitter and Facebook’s cache-control settings:

TwitterFacebook

Conclusion

In order to really focus on your site’s performance, you need to isolate (potential bad) performance of 3rd party monsters (the ones that you decided to invite to your site). Don’t make your users wait for your own content if a 3rd party provider is down.

References

Why bashing Facebook’s HTML5 hybrid app is mean and doesn’t make sense

Whaaaaat? Another one of those “Facebook’s native app is faster” posts – No! You won’t hear this from me.

After I had heard from posts/people that FB would be changing their core mobile architecture for their iOS app from more-or-less hybrid to native again, I had been waiting impatiently for the release. So there it was in the app store, as a big advocate for mobile web, I was almost afraid to launch the new FB native iOS app because I didn’t want it to be faster because I could not comprehend why a company like FB could not have engineers figure out this new HTML5 beast.

Well, I launched the app. It loaded. Fast, faster than I thought, the scrolling wasn’t sluggish anymore. Native development succeeded? Not so fast my friend! I made sure I’d read the FB engineering blog post that explained the architecture behind the latest iOS app they built.

“For areas within the app where we anticipate making changes more often, we will continue to utilize HTML5 code, as we can push updates server side without requiring people to download a new version of the app.”

To me that sounds pretty hybrid, no? – Nobody said hybrid is all webviews and no native components. So why the fuss? They identified the areas where pure HTML5 didn’t work quite well yet, either because of their backend/architecture was not setup to deal with mobile web/HTML5 parts due to maybe bloaded old JS or because certain elements should not be done via webview in general, e.g infinite scrolling that can’t perform well within an iOS webview.

Here,  Facebook’s engineering manager Dave Fetterman explained about a year ago FB’s approach to HTML5 and its challenges.

“HTML5 is probably the way that we should have done it. This is the way we get to do it now because HTML5 has changed so much under our feet. The initial attempt at building a hybrid application, there were certain things in HTML5 that weren’t ready yet and we said forget it, we are going to keep moving forward.”

There have been many people out there expressing their views on Facebook and HTML5. Here are some I liked and could agree with.

From the comments block at mashable.com, Esteban Saa writes:

I’ll take this opportunity to write about how Apple in order to maintain control over their app market are creating problems for HTML5. For instance they won’t let use run our web apps at full JS speed stating security issues, which everyone knows is BS. Don’t get me wrong I love Apple products, but practices like these really hurts the evolution of software.

Mobile Marketer interviewed people in the industry and got the following opinions

“Facebook by all accounts, didn’t make mobile its top priority early on,” he said. “As a consequence, they underinvested in their mobile experience (i.e. HTML5) over the past two years. (…) With other companies such as Netflix, LinkedIn, and Instagram, they committed fully and put their best resources on it. Facebook tried to recycle too many things from their desktop technology into HTML5 instead of starting with an approach fully optimized for mobile.”

And finally Tobie was sent from FB to come forward explaining what’s slowing down the Facebook HTML5 app

Scrolling performance
------------------------
I've already started sharing some of it with the W3C WebPerf WG[4]. Will
continue bringing it to other relevant WG in the upcoming weeks.

This is one of our most important issues. It's typically a problem on the
newsfeed and on Timeline which use infinite scrolling (content is
prefetched as the user scrolls down the app and appended) and end up
containing large amounts of content (both text AND images). Currently, we
do all of the scrolling using JS, as other options were not fast enough
(because of implementation issues).

My honest opinion based on everything I read on the internet, etc, and my own brain:

  • FB underestimated mobile web, hence they didn’t care much about mobile, and did not built their previous hybrid app accordingly with any potential growth of mobile web user base.
  • They didn’t care about mobile that much because they didn’t know back then how to monetize it with ads. Of course, Facebook is huge and has so many smart engineers working there but they also have smart product managers, and we all know that FB is not making (much) money with their mobile products because they don’t include ads. I would argue that the focus on mobile has never been that big at Facebook to begin with and hence the apps & mobile web were always low-priority.
  • They tried their best with their knowledge back then, now they share their problems and frustration: good for us to learn from their mistakes and make it better.
  • Despite Zuckerberg admitting they made a mistake counting on HTML5, they will still continue to focus on mobile web because research says there are still more users accessing Facebook mobile web than Android and iOS together.
  • I agree with codefessions.com article “It seems as it was the rethinking of the architecture that fixed everything, not the underlying technology.”
  • HTML5 hybrid apps are not the solution for all businesses: Facebook’s infrastructure is unique and needs to handle thousand of requests constantly. Watch the velocity video (around minute one) for more details. It’s a challenge to handle loads like this, not only for mobile, but also desktop.
  • Web Apis still need to catch up to deliver native-interaction properly, hence device specific features might still need to be developed natively.

My suggestion for anybody thinking not to develop hybrid mobile web apps now, consider the following thoughts as they might convince you to shift your focus back to building hybrid/HTML5 apps.

  • Don’t use bloated JS within your webviews: You don’t need to put all your logic on the client-side, server-side mobile (app) detection and enhancements can give you higher performance as well.
  • Draw a comparison to Facebook, are you a dynamic content / social media company or a content-centric company like news broadcaster? Do you rely on complicated SQL queries or do you “just” show simple content results. How many people access your content and what kind of daily/hourly etc? What areas need to perform fast?
  • If you need to load content on the fly, e.g. with infinite scrolling, think about moving such component to a native view, not everything has to be a webview! Be smart and honest about it.
  • Evaluate each component of your hybrid app and follow the rule if the difference in terms of performance and user experience is way better with native code, use native code. I believe certain elements should be done natively, no doubt, navigation items behave less sluggish than  mobile web implementations. However certain static/less dynamic areas with content could be served in a webview.
  • Be aware of things HTML5/the browser can’t do or has problems with and try to fix them, e.g. the 300ms onlick delay.

One thing you should always strive for is making your product better and faster, no matter what you choose to use, HTML5 or a native codebase.