How do large Web sites handle the load of millions of visitors a day?
Since most of the large Web sites meet all of these conditions, they need significantly larger infrastructures.
When you visit a site that has a different URL every time you visit (for example www1.xyz.com, www2.xyz.com, www3.xyz.com, etc.), then you know that the site is using the second approach at the front end. Typically the site will have an array of stand-alone machines that are each running Web server software. They all have access to an identical copy of the pages for the site. The incoming requests for pages are spread across all of the machines in one of two ways:
// this tells jquery to run the function below once the DOM is ready
$(document).ready(function() {
// choose text for the show/hide link
var showText="Show the hidden text";
var hideText="Hide the text";
// create the toggle link
$("#hide_this").before("
"+showText+""); // hide the content $('#hide_this').hide(); // capture clicks on the newly created link $('a#toggle_link').click(function() { // change the link text if ($('a#toggle_link').text()==showText) { $('a#toggle_link').text(hideText); } else { $('a#toggle_link').text(showText); } // toggle the display $('#hide_this').toggle('slow'); // return false so any link destination is not followed return false; }); });
Browse the article How do large Web sites handle the load of millions of visitors a day?
One of the surprising things about Web sites is that, in certain cases, a very small machine can handle a huge number of visitors. For example, imagine that you have a simple Web site containing a number of static pages (in this case, "static" means that everybody sees the same version of any page when they view it). If you took a normal 500MHz Celeron machine running Windows NT or Linux, loaded the Apache Web server on it, and connected this machine to the Internet with a T3 line (45 million bits per second), you could handle hundreds of thousands of visitors per day. Many ISPs will rent you a dedicated-machine configuration like this for $1,000 or less per month. This configuration will work great unless:
There are three main strategies for handling the load:
The advantage of this redundant approach is that the failure of any one machine does not cause a problem -- the other machines pick up the load. It is also easy to add capacity in an incremental way. The disadvantage is that these machines will still have to talk to some sort of centralized database if there is any transaction processing going on.
Microsoft's TerraServer takes the "single large machine" approach. Terraserver stores several terabytes of satellite imagery data and handles millions of requests for this information. The site uses huge enterprise-class machines to handle the load. For example, a single Digital AlphaServer 8400 used at TerraServer has eight 440 MHz 64-bit processors and 10 GB of error checked and corrected RAM. See the technology description for some truly impressive specifications!