stuff i do at work

(maybe i’ll make less mistakes if i remember some)

High throughput server apps and exception handling

Posted by noah on November 25, 2008

(or how I took down the server yesterday)

Yesterday, there were multiple failures leading to a single catastrophic failure.  We transitioned our SQL server to SQL Server 2008, so our database server changed.  I typed in the new address in the config file for my app, started the app, and checked the log for errors.  I was clear so I went home.

The next Monday, the site was down because it couldn’t receive any quotes from my server.  The application couldn’t reach the SQL server because the name wouldn’t resolve!  DNS configuration was mistake #1.

Why didn’t the error get logged?  The database calls were only made during market hours, so the code never got hit.  Assuming that checking the log would be enough was mistake #2.

I had actually anticipated an error like this, but my code wasn’t sufficient.  I caught the exception thrown during the database call, and wrote code to do this only once a day… except during the error condition.  So I’d atttempt to make the database call every single time a quote was requested (which is about 500-1000 per second).  Making so many calls is mistake #3.

The database call would take time to complete.  Even if it was fast, it wasn’t fast enough to handle the volume.  Overwhelmed, the server would freeze up and would stop responding to requests.  Not anticipating a response delay in a critical section (both in the threading sense and in importance) of code was mistake #4.  I fixed this by using a TryEnter block of code so that only one thread could do the database request at a time, and supplying an estimated value if a thread couldn’t enter that section of code, or if an exception happened.  To avoid hitting the TryEnter line every time, I used a form of double check locking.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>