Catalyst scales Lamp and Drupal to meet enterprise demands
An Interview with Mike O'Connor -
Catalyst IT New Zealand
Readers of this article were also interested in:
So You'd Like To Use MySQL...
This interview with Catalyst's
Director Mike O'Connor sheds light on how enterprise scaling of Lamp
and use of CMS including DRUPAL and MOODLE can dramatically benefit
companies.
Catalyst
IT is a New Zealand based company that began in 1997 when open source
technologies, applications and awareness were still in their infancy. They
chose open source as their key technology stack as it gave easy
access to a wealth of technologies.
|
|
|
OPEN
SOURCE AS AN ENTERPRISE SOLUTION
REALLYLINUX:
Catalyst has been implementing solutions around open source
technologies probably longer than most any other IT company in the
entire of New Zealand. Can you share some of the highlights/most
difficult challenges?
Catalyst's
Mike O'Connor: Any services based business goes through good and
bad times, and we've been through enough challenging periods in our
history to appreciate what we have achieved. By staying committed to
our open source values, we have been able to differentiate ourselves
in the market, and attract dedicated and highly skilled geeks who
share and maintain our passion for open source technologies and the
values that surround them.
|
To
spread our message, we have been and always will be very active
around issues that affect the freedoms offered by open source.
We
look around sometimes and reflect on the fact that Catalyst operates
NZ's Electoral Roll, we supply the systems for our national general
election, operate NZ's .nz
domain name registry, operate our national wagering site, service
multiple online media services, and a global list of Moodle,Totara,
Drupal, Mahara and Koha clients.
It's
only been possible through our choice of open source, and the flow on
effects of that decision.
DATABASE
SCALING
REALLYLINUX:
Mike, can you perhaps speak to the specific critical facets you see
for effectively scaling the mid-tier (database) for web sites?
Mike
O'Connor: While Postgres is our typical choice for a database
back end component, we also use Mysql for some Drupal deployments.
However
with respect to scaling a database layer, in both cases the task
comes down to careful choice of appropriate hardware (amount of
memory, cpu cores, IO subsystem type and layout), followed by tuning
various parameters for the database product concerned. Both Mysql and
Postgres scale very well with the usual number of cpu cores available
in mainstream hardware.
A
limited degree of scale-out is possible by means of replication to
one or more additional nodes, but currently neither of the two
database products above have true "cloudy" - multi master
sharded or single master self managing sharded configurations that we
can generally use (note our applications are typically not certified
with Mysql cluster).
REALLYLINUX:
There are many companies facing the prospect of expanding their
mid-tier due to scaling and growth. Can you provide a glimpse into
the infrastructure that runs large database driven websites like
stuff.co.nz? Servers, vms, how do you manage load balancing or
issues around scaling a "multi-site" drupal environment?
Mike
O'Connor: Catalyst typically deploys high-traffic sites on a
multiple-tier architecture, using free software throughout.
The
application server tier consists of a number of servers, running
whatever software stack is appropriate for the site in question.
These application servers are load balanced using an IPVS
(http://kb.linuxvirtualserver.org/wiki/IPVS)
instance; generally configured on the external firewall.
The
back-end database servers will run either MySQL or PostgreSQL,
depending on the individual project's requirements. In addition, the
back-end tier may also provide additional services like Memcached or
Apache Solr.
Finally,
in some situations we may deploy a caching tier in front of the
application servers. Originally we would use Squid for this caching
tier, however for recent deployments we have used Varnish instead, as
we find it is far more flexible.
SCALING
DRUPAL
REALLYLINUX:
Catalyst has developed a renowned expertise around Drupal. In
specific context, can you share some of the facets to ensuring that
Drupal sites like the ODT.co.nz
and SCMP not only handle peak loads now, but can also scale to
future needs and adapt to mobile and other platforms? As a follow-up
do you see any limitations with Drupal in this space?
Mike
O'Connor: Drupal's incredible flexibility, in design, is an
architectural priority over performance.
While
that makes Drupal capable of many things, often, high performance
isn't something that comes out-of-the-box. To improve site
performance with Drupal 7 for instance you can apply some of these
techniques:
By
default, Drupal uses its database as a cache store. Under MySQL (or
MariaDB), this is just as fast to read and write to as using
Memcache. However, when your database is under high load, you don't
want that to effect the performance of your cache so separating that
out is a good idea. And for the same reasons, whatever you use as
your cache handler, don't host it on your database server.
Using
Drupal's Panels module for page layout and disabling Drupal blocks
is a good idea. Panels allows you to set better caching rules around
your content than Drupal core allows you to do.
Use
caching in Views module. Views can spend a lot of time building a
SQL query and even more time rendering it. Caching at least the
query construction can save you a lot of time. Especially when a
page uses a lot of Views.
Page
level caching is a huge performance win. Boost module works well for
medium sites pushing Drupal generated content to filesystem for your
webserver (Apache2 supported be default) to attempt to serve before
falling back to Drupal. Varnish http accelerator can do a better job
smartly handling many connections and serving cache specific to each
client type by obeying HTTP standards now also supported in Drupal
7.
Finally,
while Drupal can do a lot by itself. You can do more by utilizing
your entire technology stack. For example, because Drupal is written
in PHP, it has to compile to opcode and bootstrap on each request
which is a lot of overhead if your task could be facilitated with
smarter webserver configuration, page caching or local browser
caching.
DRUPAL
LESSONS LEARNED
REALLYLINUX:
What is your biggest lesson learned with regard to use/development
in the Drupal context? You've also done a lot of work in the Moodle
space (with your Totara), can you briefly share what determined your
choice with regard to this?
Mike
O'Connor: We chose Moodle and Drupal due to their feature sets,
and their fit to the technology stacks we were experts with.
Both
Moodle and Drupal are large code-bases and application architectures
and we accept their architectural and performance designs, good and
bad, and just work with them.
They're
both fantastic examples of the power of the open source paradigm,
with world-wide usage that far exceeds those of all their proprietary
competitors. Our focus is using them and inproving them, we have
staff associate with each project, and we insist on any improvements
or enhancements we make are sent upstream to keep them evolving.
LAMP
STACK SCALING
REALLYLINUX:
Finally, can you provide some insight into how you scale the LAMP
stack (in your case postgres) and issues you've found when dealing
with peak volume or fluctuating peaks on infrastructure? How do you
manage these variances, or deal with excess bandwidth vs. peak loads?
Mike
O'Connor: The biggest lesson learned to delivering great
performance on large scale systems (200,000+ users) has been testing
our assumptions; finding the bottlenecks and then fixing them. On
very large systems bottlenecks can be in strange places, such as IO
buses on the servers, networking or in database queries that perform
fine on most systems until you put 2 million users on it.
Drupal,
Moodle and Totara are all Opensource web applications and run on very
similar technologies; therefore as we learn new techniques to
optimise for a Drupal system the lessons are often equally applicable
for Moodle and Totara. Getting great large scale performance out of
these systems requires understanding the underlying technologies so
you can architect an optimum system to deliver to the web
applications performance requirements.
We
have had several instances where customers have come to us with poor
performance because their systems were installed on Virtual Servers
or cloud infrastructure without taking into consideration important
bottlenecks such as DiskIO and CPU contention. Cloud and virtual
infrastructure are great solutions, however you need to understand
the limitations of infrastructure and networks you are using if your
are to guarantee your performance requirements.
Scaling
systems such as Moodle, Totara and Drupal requires building from the
base up, understanding your performance requirements, infrastructure
and then testing your assumptions to implement solutions to overcome
the unknown bottlenecks.
REALLYLINUX:
Thank you for taking the time to share these details, as we know that your experiences and expertise in Open Source deployment and scaling will benefit many in the community.
For
further information regarding the technologies and companies discussed in this
interview please see:
http://catalyst.net.nz/
http://drupal.org/
http://www.totaralms.com/
http://www.postgresql.org/
http://moodle.org/
This interview provided courtesy of Catalyst IT Ltd. a New Zealand company, and is published by permission 2012.