Fundamentals of Massive Linux Scaling -- reallylinux.com
Fundamentals of Massive Linux Scaling
by Mark Rais, senior editor ReallyLinux.com
This introductory article encapsulates four of the top
things I learned about major enterprise Linux scaling. There are many other essential ingredients, and I express some of them in my Scaling Linux Servers article. However, for the time being I
think these four convey the first rung, or foundational aspects for a
robust scaled server complex using Linux. They also address issues I
see fairly consistently in some Linux enterprises.
My hope is that these
fundamental principles will help some administrators overcome obstacles with their massive Linux scaling projects.
CACHING IS NOT THE SOLUTION
Notice that it can be part of the
solution. I propose, however, that caching can not be the crux of
the solution. I'm referring to multi-tiered infrastructure.
If you have a single tier server complex and you place caching boxes
into the foray of server requests, then perhaps it will work; simple
socket requests for instance.
But when you start to see latency
issues across the network routers, the database I/O, the in-bound
pipes, emphasis on caching is just too single sided in my book.
Instead, I propose that where some
architects literally put all the eggs in the caching basket, others
are wise enough to scale where known latencies already exist, rather
than try to offset these with caching schemes that often fail.
How do I know this?
When my team tried
to use caching as a foundational component of scaling for the
"AOL/CBS Big Brother" website, it just failed at first. We saw
over half million in-bound requests per second come through the pipes and the
whole thing melted. Melted so bad that our miscalculation began
choking out routers way down the pipe. In reality we had failed to
understand two very important aspects regarding caching - besides learning that
a prime time TV show can really juice up webhits.
"We saw over half million in-bound requests per second come through the pipes"
First, caching for content driven
websites is often problematic since certain content pieces are unique
and dynamic to the last minute. So where we should normally have had
the final content delivered and cached, some of the editorial staff
were still working on bits and pieces of content. Some of the
content was essential and caching was of absolutely no use. In more
news oriented sites this is especially an issue. Obviously this is
more a human factor than a hardware factor, but nevertheless
Second, caching works when properly
configured. However, configuring caching for major scaling endeavors
also introduces complexities you ordinarily do not notice. Your
caching schema for an existing infrastructure may simply not address
new peak load characteristics.
In our case, the caching failed
because major schema changes were needed and ultimately proved to be
untestable without true peak load. Even the best testing tools can
not present true peak load evaluations, and thereby fail to uncover
certain key nuances in the schema. The result in our case was that
the caching systems began passing the ball to one another and at such
a high rate that they actually took each other down. The pressures
to introduce a new scheme along with many new components in a
ridiculously short period of time contributed to the failure.
I have found that effective mass scale caching requires time and thorough
testing. Two things that are sometimes hard to come by during a
KEEP DNS ROUND ROBIN FOR THE BIRDS
Well, we can all probably agree on one
thing. Using the DNS server as a primary mechanism for load
balancing sucks. Perhaps my conclusion isn't as eloquent as some
may prefer, but it's based on some hard facts.
First, I've met so many people who
use simple DNS round robin configurations to deal with peak load
balancing and simple scaling, it scares me. If you're getting
1,000 hits per second, maybe this will work - for the time being.
But when you move to serious peak loads, DNS Round Robin falls flat
on its face because it can not address the failure points.
Second, based on the first point, DNS
can't even identify when the server fails on load. Instead, it
keeps passing the requests. And, unless you've written some fail-over
code to pull a downed server off the DNS, it just keeps failing.
I guess the one positive for using DNS,
if the server chokes on a request the DNS just returns a fail rather
than pass the request on or stack it in an exponentially increasing
The main point is that using DNS round
robin isn't a great solution for you whether you hit peak or not.
It simply can not address re-balancing or cutoffs without manual
If you have the money then solutions
with expensive switches like the ones Foundry offers, as an example, makes a whole
lot better sense. I just prefer not to see people dealing with peak
load failures on Friday nights because they have to manually adjust
Someone asked if this includes general
round robin approaches. It all depends on the situation. We once
used a "stupid" round robin that was based on simple code that
alternated which servers would get requests by time. Every few
seconds requests were processed to alternating servers. It actually
worked because we were passing very little data per request. We used
this to do a simple and rather inexpensive load balance for shopping
requests during peak holiday shopping times. The infrastructure
stayed relatively the same with only the addition of the front
caching server managing requests.
But let me say that this scheme fails
when the latency for retrieving something such as large images goes
to an already loaded server and you see requests slow down. Then
every other system on the round robin takes the load right? WRONG.
The next server in line receives the off loaded requests and it too
goes down. Yet again you can see a nice domino affect.
In general round robins work well in
smaller scale environments. But not when dealing with major peak
loads - emphasis on peak. I just as soon keep my distance unless
it's a mid-sized environment.
NFS AND BIG DISK DRAMA
For one reason or another, NFS mounts
are everywhere in the Linux world. There's absolutely nothing
wrong with NFS in most situations. But add up those mount points,
spread them across a server farm, add cross site connectivity or
other complexity and you get into a bit of a muddle.
The biggest issues I've personally
seen on mass scale NFS mount infrastructure is that we either die
because the mount points start failing or because the disk I/O chokes
out when implemented on striped disks.
If you use NFS mount points, there's
nothing bad about it.
If you use RAID 5 striping, there's
nothing bad about it.
If you use large disks, which most
people do to save costs on their storage solutions, there's nothing
But combine large disks, complex
striping with lots of NFS mount points and you reach that wonderful
realm of death by positional latency. It is one of
the biggest pains to uncover and then resolve during a scaling
Instead, I've consistently shared
with people that if you're looking to do some serious scaling with
NFS mounts, please stay away from those huge cheap drives.
several smaller GB drives costs more but reduces the chances that
disk latency combined with striping algorithms will become a culprit
for NFS mount point failures. This may be old news to some,
but there are still many shops mixing these three ingredients in
their daily Linux scaling work.
"stay away from those huge cheap drives"
I love MySQL. I love seeing PHP and
other dynamic web design. It provides for a great tool set and makes
most websites valuable. But there is no question that some great
websites have choked because the architecture placed everything into
dynamic page loads.
I still believe, if you're serving
serious web hits, that reducing the dynamic requests and increasing
the static HTML content is a sure way to beat the peak load scaling
problem. I know this from experience because the solution that the outstanding
Operations staff used to bring back alive the Big Brother website was
to flatten the highly dynamic site into basic html, rdist the pages
to about 220 web servers and run those servers at 105% peak load.
There's nothing wrong with designing
webpages and implementing dynamic websites. But on spike loads the
key hit pages need to have far less dynamic elements that must be
called out of caches, out of databases, out of other server file
stores, and far more html. Yes, good old HTML. The good major
websites utilize this nice balanced approach. Unfortunately, with
today's emphasis not only on dynamic content but also dynamic ads,
the slowdown is perceivable even to non-technical savvy web surfers.
One key ingredient is to isolate non-essential elements of a web page
and use static HTML. Another is to ensure that the page design itself contributes to
load balancing during peaks. For instance, offering navigation on the site that does
not channel all readers into single highly dynamic pages. Instead, the navigation can
create parallel paths, reducing overall requests on single pages, while still granting users content they require. Finally, if you must keep certain elements as dynamic, then ensure that you have normalized your database. Simplifying your DB, cleaning up tables and keys that are redundant or bottlenecks, and avoiding BLOBS (binary large objects) like the plague, will further contribute to peak performance.
Finding the right balance between
dynamic and static is not easy and often leads to feuds between the
operations department trying to scale for peak loads and the design
department looking to create even more compelling content. But I'm
of the opinion finding such a balance is essential for good scaling.
Ultimately, the decisions for scaling
will fall on the shoulders of the operations personnel who are likely
to be called in at the last minute, during crisis, and under
pressure. By keeping these four foundational aspects in mind during
the design phase, you can make even the most severe load spikes
Mark Rais serves as senior editor for reallylinux.com, and has written a number of books and articles related to enterprise Linux and UNIX implementation. He also served as a senior technology manager for America Online, Inc. and Netscape, as well as task force leader for several International non-profit organizations. Today, Rais promotes Linux use worldwide as an integration consultant.