Caching Dynamic Web Content to Increase Dependability and Performance

Written by Brian Moon and Daniel Beckham, dealnews.com developers, in 2001

Introduction

As more and more Web sites discover the advantages of storing content on a database server, dependability problems may be introduced that can be minimized through caching. Caching decreases the number of repetitive queries for unchanging data and increases the resources available for other queries, particularly complex queries like searching.

In an ad based, revenue world, the ads on your site have got to be there to make money. When the database server at dealnews.com went down one day and the ads stopped flowing, our CEO had a fit. He looked at us and said, "I want and ad system that works even if the database is down". At the time, our jaws dropped. After gaining our composure, we started thinking about the problem. We came up with the revelation that the data for our ads did not change very often and it could easily be exported into files that could be used by the ad system instead getting information straight from the database. From that time on we were sold. Every project since has incorporated caching in some way. If you visit dealnews.com today, the only pages that ever make queries to the database are the ones that involve searching through out archives. Does it help? You bet. There have been times when some code would slip through that would use direct database access. Sure enough the servers would start to grind.

dealnews.com is not alone in this decision. We asked Rob "CmdrTaco" Malda of Slashdot if and how Slashdot uses caching. He told me that they cache anything they can and according to Rob, caching "saves many clock cycles". Lack of performance by their servers is what led them to caching in the first place. He said that more hardware would have helped, but at the time they couldn't afford the hardware that would have been needed to do the job. The complete email interview can be found in the Appendix.

What is Caching?

Caching is simply storing data for later reuse without having to go through the process of generating it all over again.

Caching is all around us in our computers and on the web. It's often thought of as a new idea in a world where dynamic, user driven content is popular, but many aspects of running a web site involve caching. The CPU in the machine caches instructions. Database engines cache query results for future requests. NFS clients cache files retrieved from NFS servers. Browsers cache images and entire pages to make the web experience faster.

In this discussion we will cover two types of caching:

  1. Caching the results of repetitive database queries
  2. Caching actual HTML, XML or WML output to the browser

By caching the results of queries, we mean keeping the results of a database query in a reusable format for later use instead of retrieving the same data from the database server again. Caching the actual output that is sent to the browser alleviates the need to continuously generate the same HTML, XML, WML, etc. over again. Pre-generated query results or browser output can be stored in local files for quick retrieval during subsequent page requests.

Why Use Caching?

Dependability

Have you ever visited a web site and seen database connection errors instead of content? Sites that rely solely on database servers for their content are vulnerable if the database ever goes down or becomes corrupted. Smart caching of your database data can keep your site running long after your database is no longer serving requests.

Performance

It's a common misconception that using a database is as fast, if not faster than straight file I/O. While they can be very close in speed, the equality in performance decreases as queries become more complex and the database becomes more loaded. Performance can always be increased by throwing more hardware (and more money) at the problem, but not all of us can afford those 10 feet tall rack servers we see in Microsoft ads and commercials. Smart caching of your repetitive database queries and website content will allow your site to serve more users using the same hardware that you currently have.

Caching Negatives

Caching can add a layer of complexity to your web site. Code has to be written to create the layer of cache between your database storage and what the user sees in their browser. Data synchronization is also a concern and must be kept in mind when designing your method of caching. Proper mechanisms have to be in place to keep cached data from getting too old. Whether that be a hard expiration time or a automated regeneration of the data at specific intervals or triggers. An additional pitfall for larger sites that use load balancing is the possibility of duplicating cached data across each web node and the extra storage space that it could incur. We get around this at dealnews.com by using a single NFS server for cache and content storage.

What data should be cached?

Not all types of data are good candidates for caching. To determine if the content on your site can benefit from caching there are two questions that you can ask yourself:

  1. How often is the content requested?
  2. How often does the content change?

Content that doesn't change from moment to moment, content that is viewed often (such as the front page) and repetitive or complex database queries are excellent candidates for caching. Styles of websites that would benefit from caching are news, publications and other article based sites and forums.

Content that would not necessarily benefit from being cached is data that changes from moment to moment, pages that are rarely viewed or searches that aren't often repeated. In addition, searching the cache itself is usually a bad idea as it defeats one of the larger purposes of using a database.

So, How Do I Cache Data?

Converting sites that use straight database access is fairly easy. The following code is an example of how you might display the last 10 articles on the front page of a news publication. This code repeats the same query for each user, even if the data doesn't change.

PHP:

  $conn=mysql_connect("localhost", "apachecon", "apachecon");

  $SQL="Select id, headline, body, creation_time
       from articles order by creation_time desc limit 10";
  $res=mysql_db_query("apachecon", $SQL, $conn);
  while($rec=mysql_fetch_assoc($res)){
      echo "<b>$rec[headline]</b><br />$rec[body]<p>\n";
  }

Perl:

#!/usr/bin/perl -w

use DBI;
my $dsn = "DBI:mysql:database=apachecon;host=localhost;";

my $dbh = DBI->connect($dsn, "apachecon", "apachecon");

$sql = "Select id, headline, body, creation_time
 from articles order by creation_time desc limit 10";
my $sth = $dbh->prepare($sql);
$sth->execute;

while (my $row = $sth->fetchrow_hashref) {
    print "<b>$row->{headline}</b><br>$row->{body}<p>\n";
}

$sth->finish;
$dbh->disconnect;

There are several problems with using the above method. The first is that if your database becomes unreachable, you will see no content. The second is not quite as easy to see. The example query above is very simplistic and doesn't really use the resources available in the database engine. A real site might have several tables that store the article header, article bodies, author tables and search keywords. A more complex join would then be used to retrieve the necessary information to display the content, slowing down the time it takes to get the information to the user.

Instead of repeatedly asking the database to give us the same information, why not store the information in a reusable format that we can quickly access without the overhead of connecting and querying the database?

There are two ways to do this, the first is the capture the data returned from the database and store it in a reusable form. The second is to store the output generated in our scripts and also store it for later use.

Three functions were created to help read and write cache files as needed; write_cache(), read_cache() and get_db_cache():

PHP:

// cache.php - caching utility functions

// set error reporting to include the trigger_error types
error_reporting(error_reporting() | E_USER_ERROR | E_USER_WARNING | E_USER_NOTICE);

// the web server will need to be able to write to this dir.
$CACHE_DIR="./data";

////
//! Write out serialized data.
//
//  write_cache uses serialize() to store $var in $filename.
//  It retuns true on success.
//
//  $var      -  The variable to be written out.
//  $filename -  The name of the file to write to.

function write_cache(&$var, $filename){

    GLOBAL $CACHE_DIR;

    $filename="$CACHE_DIR/$filename";

    $success=false;

    // try to open the file
    if($fp=fopen($filename, "w")){
        // write serialized data
        fputs($fp, serialize($var));
        fclose($fp);
        $success=true;
    }

    return $success;
}

////
//! Read in seralized data.
//
//  read_cache reads the serialized data in $filename and
//  fills $var using unserialize().  It returns true on success.
//
//  $var      -  The variable to be filled.
//  $filename -  The name of the file to read.

function read_cache(&$var, $filename){

    GLOBAL $CACHE_DIR;

    $filename="$CACHE_DIR/$filename";

    $success=false;

    // try to open file
    if($fp=@fopen($filename, "r")){

        // read in serialized data
        $szdata=fread($fp, filesize($filename));
        fclose($fp);

        // unserialze the data
        $var=unserialize($szdata);
        $success=true;
    }

    return $success;
}

////
//! Get data from the cache or the database.
//
//  get_db_cache checks the cache for cached SQL data in $filename
//  or retreives it from the database is the cache is not present.
//
//  $SQL      -  The SQL query to exectue if needed.
//  $filename -  The name of the cache file.
//  $var      -  The variable to be filled.
//  $refresh  -  Optional.  If true, do not read from the cache.

function get_db_cache($SQL, &$var, $filename, $refresh=false){

    $var=array();

    // check for the refresh flag and try to the data
    if($refresh || !read_cache($var, $filename)){

        // Didn' get cache so go to the database.
        // connect to the database
        $conn=mysql_connect("localhost", "apachecon", "apachecon");

        // execute the query
        $res=mysql_db_query("apachecon", $SQL, $conn);
        if($err=mysql_error())
            trigger_error($err, E_USER_ERROR);

        // loop through the results and add them to an array
        while($rec=mysql_fetch_assoc($res)){
            $var[]=$rec;
        }

        // write the data to the file
        write_cache($var, $filename);
    }
}

Perl:

#!/usr/bin/perl -w

# cache.php - caching utility functions

use Storable qw(store retrieve);
use DBI;
use strict;

my $dbh;
my $dsn = "DBI:mysql:database=apachecon;host=localhost;";
my $CACHE_DIR = "./data";


####
# Write out serialized data.
#
# write_cache uses the Perl module Storable
# to store $var as serialized data in $filename.
#
# write_cache retuns true on success.
#
# $var      -  Data to be written out (must be a reference)
# $filename -  The name of the cache file to write

sub write_cache {
    my ($var, $filename) = @_;

    $filename = "$CACHE_DIR/$filename";
    if (store($var, $filename)) { # store expects a reference
    return(1);
    }

    return(0);
}

####
# Read in seralized data.
#
# read_cache reads the serialized data in $filename and
# returns a reference to the unserialized data.
#
# $filename -  The name of the cache file to read.

sub read_cache {
    my ($filename) = @_;

    $filename = "$CACHE_DIR/$filename";
    if (-e $filename) {
    return( retrieve($filename) );
    }

}

####
# Retrieve data from the cache or database.
#
# get_db_cache checks the cache dir for cached SQL data in $filename
# or retreives it from the database if the cache is not present.
#
# get_db_cache returns a reference to an array of SQL data
#
# $SQL      -  The SQL query to exectue if necessary.
# $filename -  The name of the cache file to use
# $refresh  -  Optional.  If true, do not read from the cache.

sub get_db_cache {
    my ($sql, $filename, $refresh) = @_;
    my $var;

    if ($refresh || !defined($var = read_cache($filename))) {

      $dbh = DBI->connect($dsn, "apachecon", "apachecon");
      my $sth = $dbh->prepare($sql);
      $sth->execute;

      while (my $row = $sth->fetchrow_hashref) {
      push @$var, $row;
      }

      $sth->finish;
      $dbh->disconnect;

      write_cache($var, $filename);
    }

    return($var);

}

write_cache() takes a variable and a filename and writes a serialized version of the variable to the cache file. Instead of actually creating a special data format for our cache files, we simply store the variable in the file. This greatly simplifies the process of storing and retrieving our cached data and allows us to focus on displaying the content.

read_cache() is the opposite of write_cache(). It recreates a variable previously stored in a cache file. (Please note that due to language differences the PHP and Perl version of read_cache() differ slightly in the way that they return the cached data.)

get_db_cache() is a generic wrapper for retrieving cached data. If the cached data does not exist, get_db_cache() will use the SQL parameter to directly query the database to retrieve the data. It will also bypass the cache if the refresh parameter is true. (Please note that due to language differences the PHP and Perl version of get_db_cache() differ slightly in the way that they return the cached data.)

Using the above functions, let's convert the first example into a cached version:

PHP:

include "cache.php";

$SQL="Select id, headline, body, creation_time
      from articles order by creation_time desc limit 10";

get_db_cache($SQL, $articles, "articles.cache");

//loop through the results and create the output
foreach($articles as $art){
    echo "<b>$art[headline]</b><br>$art[body]<p>\n";
}

Perl:

#!/usr/bin/perl -w

use strict;
require("cache.pl");

my $sql = "select id, headline, body, creation_time
           from articles order by creation_time desc limit 10";

my $articles = get_db_cache($sql, "articles.cache");

foreach(@$articles) {
    print "<b>$_->{headline}</b><br>$_->{body}<p>\n";
}

Now our example only queries the database if there is no cached data available. Once the data is generated the first time, it no longer needs to be regenerated on each hit.

Lets take this one step further. In addition to caching the repetitive database query, we could also cache the actual output itself instead of having to loop through our articles array each time. Obviously this is a bit of overkill for this example, but this method can be very effective for large complex site designs.

PHP:

include "cache.php";

// Check for cached output
if(!read_cache($output, "article_output.cache")){

    $SQL="Select id, headline, body, creation_time
          from articles order by creation_time desc limit 10";

    // no cached output so get the data
    get_db_cache($SQL, $articles, "articles.cache");

    //loop through the results and create the output
    foreach($articles as $art){
        $output.="<b>$art[headline]</b><br>$art[body]<p>\n";
    }
    // write the output cache
    write_cache($output, "article_output.cache");
}

// send our output to the browser
echo $output;

Perl:

#!/usr/bin/perl -w

use strict;

require("cache.pl");

my $output;
# Check for cached output
if (!defined($output = read_cache("article_output.cache"))) {

  # no cached output so get the data
  my $sql = "select id, headline, body, creation_time
            from articles order by creation_time desc limit 10";
  my $articles = get_db_cache($sql, "articles.cache");

  # loop through the results and create the output
  foreach(@$articles) {
    $$output .= "<b>$_->{headline}</b><br>$_->{body}<p>\n";
  }
  # write the output cache
  write_cache($output, "article_output.cache");

}

# send our output to the browser
print $$output;

Caching in the Real World

The caching examples above are effective, but having SQL queries all over your code and needing to remember the file names of your cache files is too complex. What we did was to wrap up all of our utility functions with functions that perform a task. For example, if you need to get an array of all the authors in your system, why not create a get_authors() function rather than remembering the SQL statement and cache file name. It might look like this.

PHP:

function get_authors(&$authors, $refresh=false){

    $ret=true;

    if($refresh || !read_cache($authors, "authors.cache")){

        $authors=array();
        $SQL = "select author_id, name, email from authors";

        if($ret=get_db_cache($SQL, $tmp, "authors.db.cache", $refresh)){
            foreach($tmp as $author){
                $authors[$author["author_id"]]["name"]=$author["name"];
                $authors[$author["author_id"]]["email"]=$author["email"];
            }
        }

        write_cache($authors, "authors.cache");
    }

    return $ret;
}

Perl:

sub get_authors {
  my ($refresh) = @_;

  my $authors;
  if ($refresh || !defined($authors = read_cache("authors.cache"))) {
    my $sql = "select author_id, name, email from authors";
    $authors = get_db_cache($sql, "authors.cache");
  }

  my %auth_hash;
  foreach(@$authors) {
    $auth_hash{$_->{author_id}} = $_;
  }

  return(%auth_hash);

}

This wrapper function can be used in any script and no one has to remember any SQL or file names. Another good use of a wrapper function would be to turn the examples above into a function that will generate the front page of your site and can be called from any script. Something similar to this:

PHP:

function get_frontpage(&$output, $refresh=false){

    if($refresh || !read_cache($output, "frontpage.cache")){

        get_authors($authors);

        $SQL="Select id, headline, body, " .
             "date_format(creation_time, '%m/%d/%Y %h:%i %p') as date, " .
             "author_id from articles order by creation_time desc limit 10";

        get_db_cache($SQL, $articles, "frontpage.sql.cache", $refresh);

        foreach($articles as $article){
            $email=$authors[$article["author_id"]]["email"];
            $author=$authors[$article["author_id"]]["name"];
            $output .= "<tr>\n";
            $output .= "  <td>\n";
            $output .= "  <font face=\"Arial\">";
            $output .= "  <b>$article[headline]</b>";
            $output .= "  </font><br>\n";
            $output .= "      <font face=\"Arial\" size=\"-2\">";
            $output .= "$article[date] by";
            $output .= "<a href=\"mailto:$email\">$author</a></font>";
            $output .= "  </td>\n";
            $output .= "</tr>\n";
            $output .= "<tr>\n";
            $output .= "  <td>";
            $output .= "<font face=\"Arial\" size=\"-1\">";
            $output .= "$article[body]";
            $output .= "</font></td>\n";
            $output .= "</tr>\n";
            $output .= "<tr><td> </td></tr>\n";
        }

        write_cache($output, "frontpage.cache");

    }
}

Perl:

sub get_frontpage {
  my ($refresh) = @_;

  my %authors = get_authors();

  my $output;
  if ($refresh || !defined($output = read_cache("frontpage.cache"))) {

    my $sql = "select " .
              "id, author_id, headline, body, creation_time, " .
              "date_format(creation_time, '%m/%d/%Y %h:%i %p') as article_date " .
              "from articles order by creation_time desc limit 10";
    my $articles = get_db_cache($sql, "frontpage.sql.cache");

    foreach(@$articles) {
      $$output .= "<tr>\n";
      $$output .= "  <td><font face=\"Arial\"><b>$_->{headline}</b>\n";
      $$output .= "    <font size=\"-1\">  $_->{article_date} ";
      $$output .= "    by <a href=\"mailto:$authors{$_->{author_id}}{email}\">";
      $$output .= "    $authors{$_->{author_id}}{name}";
      $$output .= "    </a></font></font>";
      $$output .= "  </td>\n";
      $$output .= "</tr>\n";
      $$output .= "<tr>\n";
      $$output .= "  <td><font face=\"Arial\">$_->{body}</font></td>\n";
      $$output .= "</tr>\n";
      $$output .= "<tr><td> </td></tr>\n";

      }

    write_cache($output, "frontpage.cache");

  }

  return($$output);

}

Keeping A Fresh Cache

The biggest problem with caching is making sure that everything is up to date. There are a few methods you can use to refresh your cache.

One method, in the scripts that update the database, is to simply run functions like those in the section above, setting the refresh parameter to true. By doing this, your cached data is automatically generated and is always up to date. If you keep caching in mind when writing scripts that update the database you can save yourself some headaches.

Another method is having a cache clearing script that runs at some interval and removes old cache files. This is a little cruder than the first method, but it does ensure that your cache is up to date. Also, if disk space is a problem, this method will keep the unused cache files from taking up your available hard drive space.

Performance

There is a common misconception around the net that files are bad and databases are good and it's very prevalent in the PHP community. The first time caching is mentioned to someone they almost always scoff at the idea. Even after relating first hand experience of databases and caching systems with them, they still swear that databases are faster.

Caching is not a database replacement. It's not meant for searching, permanent storage or complex querying. Databases will always be faster and more flexible in those areas. A cache file has a specific purpose to store repetitive data in a reusable format. You either you want all the data in the cache file or you don't need the cache file.

Benchmarks are still the best way to prove to people what a certain method of data storage or retrieval can do. Using the first scripts written for this paper, we ran them through several benchmarks. The first script was a simple query and loop to show results in the browser. The second used all our previously generated cache files to do the same.

The server used for the benchmarks was a Gateway PIII750/128Mb Ram used for development here at dealnews.com. It served as both the web server and database server (a fairly common practice in small to medium size sites). We used ApacheBench, Version 1.3c, which is distributed with Apache, to run through 10000 requests with 8 concurrent connections. In both cases the server load was allowed to dip below .1 before starting. Here is a summary of the results.

Method              Total Time(s)        Requests/second
--------------------------------------------------------
Not Cached                 53.893                 185.55
Cached                     39.598                 252.54

As you can see the cached version was much faster and efficient even though it actually had to parse more PHP code. But to be fair we wondered what would happen if the server did not have to bear the load of the database server. Given the fact that this is how we operate in production we felt it only fair to find out. We decided to wait for a down time and run this against our production servers. The system consists of five servers, three of which we used. We used a web server (PIII800/512MB RAM), a MySQL server (DualPIII850/1Gig RAM) and a third box configured similar to the MySQL box to run ApacheBench. These are the results.

Method              Total Time(s)        Requests/second
--------------------------------------------------------
Not Cached                 40.208                 248.71
Cached                     38.342                 260.81

The separate and faster database server did help. However, the cached version was still faster.

Lastly, we ran the test against our web server cluster. It consists of three web servers (PIII750/256MB Ram), an NFS server and a DB server (both Dual PIII850/1Gig RAM). Here are those results.

Method              Total Time(s)        Requests/second
--------------------------------------------------------
Not Cached                 39.116                 255.65
Cached                     31.832                 314.15

The most important point to realize from these figures is that using a file cache is not slower than the database. In fact it is faster.

Note: The full output from the ApacheBench test is in the appendix of this document.

Conclusion

So, hopefully by now you are a believer like we are. Caching is a great method for getting the most out of your servers without asking the most from your servers. When used correctly and efficiently, it can be more dependable, just as fast, if not faster, and just as flexible as getting data directly from the database.

For more on this topic, visit http://dealnews.com/developers/. Any new developments and new code examples will be posted there in the future.

Appendix A. ApacheBench Results

# This is results from the non-caching script
# The server was both the web server and the db server

This is ApacheBench, Version 1.3c <$Revision: 1.41 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-1999 The Apache Group, http://www.apache.org/

Server Software:        Apache/1.3.17
Server Hostname:        XXXXXX
Server Port:            80

Document Path:          /apachecon/sample1.php
Document Length:        6028 bytes

Concurrency Level:      8
Time taken for tests:   53.893 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      61990000 bytes
HTML transferred:       60280000 bytes
Requests per second:    185.55
Transfer rate:          1150.24 kb/s received

Connnection Times (ms)
              min   avg   max
Connect:        0     0    25
Processing:    14    42  1569
Total:         14    42  1594


# This is results from the caching script
# The server was both the web server and the db server

This is ApacheBench, Version 1.3c <$Revision: 1.41 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-1999 The Apache Group, http://www.apache.org/

Server Software:        Apache/1.3.17
Server Hostname:        XXXXXX
Server Port:            80

Document Path:          /apachecon/sample2.php
Document Length:        6548 bytes

Concurrency Level:      8
Time taken for tests:   39.598 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      67190000 bytes
HTML transferred:       65480000 bytes
Requests per second:    252.54
Transfer rate:          1696.80 kb/s received

Connnection Times (ms)
              min   avg   max
Connect:        0     2  2998
Processing:     3    29   432
Total:          3    31  3430


# This is results from the caching script
# The server acted only as the web server

This is ApacheBench, Version 1.3c <$Revision: 1.44 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2000 The Apache Group, http://www.apache.org/

Server Software:        Apache/1.3.14
Server Hostname:        XXXXXX
Server Port:            80

Document Path:          /apachecon/sample2.php
Document Length:        6028 bytes

Concurrency Level:      8
Time taken for tests:   38.342 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      62210000 bytes
HTML transferred:       60280000 bytes
Requests per second:    260.81
Transfer rate:          1622.50 kb/s received

Connnection Times (ms)
              min   avg   max
Connect:        0     0  2998
Processing:     4    30    29
Total:          4    30  3027


# This is results from the non-caching script
# The server acted only as the web server

This is ApacheBench, Version 1.3c <$Revision: 1.44 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2000 The Apache Group, http://www.apache.org/

Server Software:        Apache/1.3.14
Server Hostname:        XXXXXX
Server Port:            80

Document Path:          /apachecon/sample1.php
Document Length:        6028 bytes

Concurrency Level:      8
Time taken for tests:   40.208 seconds
Complete requests:      10000
Failed requests:        6024
   (Connect: 0, Length: 6024, Exceptions: 0)
Total transferred:      28915352 bytes
HTML transferred:       26985352 bytes
Requests per second:    248.71
Transfer rate:          719.14 kb/s received

Connnection Times (ms)
              min   avg   max
Connect:        0     0     4
Processing:     3    31   798
Total:          3    31   802


# This is results from the non-caching script
# The is using the server cluster

This is ApacheBench, Version 1.3c <$Revision: 1.41 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-1999 The Apache Group, http://www.apache.org/

Server Software:        Apache/1.3.17
Server Hostname:        XXXXXX
Server Port:            80

Document Path:          /apachecon/sample1.php
Document Length:        6028 bytes

Concurrency Level:      8
Time taken for tests:   39.116 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      62282454 bytes
HTML transferred:       60292056 bytes
Requests per second:    255.65
Transfer rate:          1592.25 kb/s received

Connnection Times (ms)
              min   avg   max
Connect:        0     0     9
Processing:     8    30   748
Total:          8    30   757



# This is results from the caching script
# The is using the server cluster

This is ApacheBench, Version 1.3c <$Revision: 1.41 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-1999 The Apache Group, http://www.apache.org/

Server Software:        Apache/1.3.17
Server Hostname:        XXXXXX
Server Port:            80

Document Path:          /apachecon/sample2.php
Document Length:        6028 bytes

Concurrency Level:      8
Time taken for tests:   31.832 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      62270000 bytes
HTML transferred:       60280000 bytes
Requests per second:    314.15
Transfer rate:          1956.21 kb/s received

Connnection Times (ms)
              min   avg   max
Connect:        0     0    21
Processing:     5    24   920
Total:          5    24   941

Appendix B. E-Interview with Rob Malda


>  1. How does Slashdot use caching?

Umm... we do.  We cache pages. Ummm... anything that we can cache, we do.
Basically the anonymous view of all homepage and articles is cached... that
accounts for a huge percentage of all traffic and saves many clock cycles.
We're actually reducing what we cache in a future version of Slash since it
is somewhat restrictive.

>  2. What led Slashdot to start caching its content? Was it as a result of a
>  performance problem?

Yes.

>  3. If so, did you feel that more hardware would have also solved the problem
>  and if so why did you not go that route?

Some hardware would have helped, but at the time, we had no budget ;)

--
Rob "CmdrTaco" Malda


This is a follow-up to that interview.

> I was hoping to bend your keyboard a bit more. Can you expand on the
> statement "> We're actually reducing what we cache in a future version of
> Slash since it is somewhat restrictive."?
>
> How is it restrictive?

A cached page can't do anything: its fixed content for anyone who views it.

> Is this going to be in the bender release? I am going to that session at
> Apachecon.

I believe its actually set up as an option... it can run the old way, or this
way. The reality is that for sites with fewer then a few thousand page
views a day, its just not important. For sites with tens of thousands of
page views and limited hardware, it starts getting important. But when
you have the hardware, its nice *not* to cache 'cuz you can do more.


Here is the example code:

Perl Examples   PHP Examples