Saturday, 17 May 2008

Amazon AWS - A practical experience - Part 1

Over the last few weeks I have been engaged in migrating a site I have been working on to the Amazon Web Services Environment. I have now got to the point where I feel I can start to write a series of posts about our experiences. This post is an introduction to this series.


The site I have been working on migrating is, a LAMP based graduate employment social network that I have been working with for the last 3 months. The characteristics of Bejant are as follows
  • PHP 5.2 based
  • MySQL
  • Apache 2.2
  • Memcached
  • Centos 5.0
  • Swish-e Indexer (for search).
  • Video Distribution and conversion. (ffmpeg).
The Amazon Web Services services used in this implementation are:
  • Amazon EC2 - Elastic Computing Cloud
  • Amazon S3 - Flexible Storage
  • Amazon SQS - Message Queueing.
We are also evaluating the Amazon Simple DB as a means of persisting work storage between processes, but work in this area is at a very early phase.

The Runtime environment

Before we dive into the details of how we did this port, lets take a moment to list services that we are attempting to provide.
  • 2 Front-end servers
  • 2 Database servers
  • 1 Test/QA server
  • 1 Developer server
  • 1 Video Processing Server
  • 1 Utility server (ad server, mail-list manager, Feed processing pipeline).
We chose to use the RightScale management environment, which for a monthly fee provides Monitoring, Alerting and instance management and configuration.

I looked at a few other management tools such as Scalr and EC2PHP both of which did not provide enough capabilities to reasonably manage the cluster. It is indeed possible to roll your own, but we felt that RightScale gave us an edge and made creating this complex system setup easier and more maintainable.

RightScale provides the following:
  • Replicated database solution
  • Autoscaling
  • Load balancing front ends
  • Monitoring and alerting
  • Multi-server clusters
  • Log file consolidation
  • Automated system administration
  • Dynamic server configuration
Development Requirements

We decided that we wanted to create an environment that supports the full lifecycle of the development activity, which is predominantly SCRUMM based, to that end we wanted a production pipeline that moves releases from Development to Test to Live in an organised fashion. Bejant's sprint cycles operate on a approximately two week time line, during which a number of major and minor feature enhancements are introduced, alongside the usual maintenance and bugfixing activities that are normal for any development team. The Reason for the seperate Test enviroment is to isolate the QA folk from the day to day change that occures on a development system, and allow them to operate thier own database with known test accounts and data.

The challenge here is to make sure that the codebase and database schema are aligned on each stage of the pipeline, with a site such as Bejant that is undergoing rapid development, these elements are often quite different in each stage as new features are added and roled through to production.

To that end we decided that the system would effectively boot each stage from a subversion repository, which would hold branches that reflect the stages in the pipeline.
  • The dev instances always boot from the trunk, and reflect the current state of the codebase.
  • The Test instances boot from trunk, but are set to a particular revision that is deemed to be "in test", the test engineers can chose which revision to boot an instance from.
  • The live system boots from a branch which represents a released product.
These "stages" are the same "enviroments" that are embodied in popular frameworks such as Ruby on Rails, Grails and Symfony.

In my next post we will look at some of the basics of AWS and the facilities it provides.

No comments: