Amongst the many non-geeky conversations that were had that night there were of course many very geeky ones, one of them was a great discussion of Web Scale Applications. In case that term is completely unfamiliar to you let me explain a little. When you are working at your computer and you use an application like Outlook to check your e-mail, offline, the application is running locally, just on your computer. The programmers who wrote that program could predict with a reasonable degree of accuracy the amount of work that it would be expected to do. They knew, for example, that there would only be one person using it at a time, they could analyse the way people use e-mail and come up with a usable set of figures to suggest how many e-mails people would have listed on screen at any one time. By understanding the expected workload the programmers can write code that works best at that level.
When you check your e-mail at work, whilst you are probably still using Outlook on your local machine, however it will also be connecting to an e-mail server. This will be supporting all the users within the company which could be 10, or a few hundred or several thousand. This unknown puts a little more stress on the developer as they have to code to a much bigger window of expected performance. Such software is said to be Enterprise grade (assuming that it works properly). There is hope for our poor developer though, as it is not unreasonable to state maximum acceptable performance for such software, for example stating that it will only support up to 2000 users and that after that point the company must have a second server to cope.
The classic example of Web Scale Applications is online web mail, e.g. Hotmail, Yahoo Mail, GMail etc. These applications have to support a completely unknown level of usage, the maximum of which could be everyone on the Internet, a figure that grows by the minute.
There is little in the way of standard practice when producing a Web Scale Application, it is quite a new field and something very hard to test in a lab. So whatever is out there about this type of work is of great interest to geeks like us. I mentioned a few articles I knew on the subject to people the other night and promised that I would post links to them here, so here they are;
- Tim O'Reilly's Web 2.0 Database War Stories series
- Web 2.0 and Databases Part 1: Second Life
- Database War Stories #2: bloglines and memeorandum
- Database War Stories #3: Flickr
- Database War Stories #4: NASA World Wind
- Database War Stories #5: craigslist
- Baseline: Inside MySpace.com
- Baseline: How Google works
The are all fascinating accounts of how people approached some of the web scale issues in very different ways. There still needs to be a lot more work on all of this in general, but then it is not something that is easy to assess.