Evolution of a subscription based real-time streaming server
Posted by noah on October 21, 2008
I’m writing a server application which streams data to clients based on filters that they specify. For example, a user can specify that they want to see out-of-market trades on stocks which have a volume above 100k, and under $40. Or just stocks that have just hit their daily high. I wrote a few different versions of this and ran into different problems each time.
For background, the technology I’m using here is WCF, which has one way calls, but are not truly fire and forget. If the connection fails, then an exception is thrown when the message is sent, and much cpu time is taken.
Attempt #1: All requests go into the same thread pool
Each client has a specific callback function which is used to send a quote. It was necessary to decouple the incoming data feed from the quote sending mechanism, and to make sure I was making good use of multiple processors (I’ve found earlier, that WCF can be quite processor intensive). Essentially what I did, was a work item was queued in the threadpool, and had the atomic function of sending that one quote. The problem that I ran into was that with multiple clients, a single bad client can severely impact the system. Eventually, as items get queued and executed, if one connection causes a slowdown for the quote that is being sent, all the available threads will be waiting on callbacks to that one connection. Even though I was catching the exception to flag the connection, many quotes can be sent to that same connection before the exception is thrown.
Attempt #2: Single thread creates async calls to callback functions
When I asked my boss, this is how he said most server apps were designed. I rewrote the code to make the notifications asynchronous, and a single thread was used to kick off the asynchronous operations. Since the exception would never reach me if the connection failed, I also used a synchronous polling mechanism in a separate thread to check if the connection was still okay, and killed the connection if it wasn’t. I discovered that the internal exception handling of the failed send was taking a lot of cpu time, and was unacceptable. Maybe I should have figured out that this solution would not have worked, but I didn’t understand how the async operation was implemented behind the scenes. My boss suggested that it could be at the OS level before I tried it, but now I believe it was a standard IAsyncResult type of operation.
Attempt #3: A dedicated queue and a threadpool work item for each client
I rewrote the code again, this time to solve the problem of one bad client clogging the system. Each client would have a queue of quotes which needed to be sent, and a threadpool work item which would take care of sending the message. If one threadpool work item stopped working, others could still work. I knew that having an individual real thread for each client would be bad, since each thread takes about 1 megabyte, and too many threads causes thrashing at the cpu level. When I thought about this, I realized that it wouldn’t work. Once you go beyond the 25 worker threads, the queued items would be on pause, and would have to wait their turn before they are run. Not exactly real time streaming data for everyone. I tested this by running 400 threadpool work items which would sleep and increment a static value each time. I should have gotten +400 each time, but I didn’t. This design was also inherently flawed in it’s misuse of the threadpool, which is designed to do discrete tasks.
Attempt #4: A dedicated queue for each client, and a single thread that kicks off threadpool work items.
I believe that this is my final and best solution. One thread is used to determine whether quotes are eligible for being sent, and if they are, they are queued in a short queue (items are dequeued if it gets too long), one for each client. Another thread loops through each queue of eligible quotes, sees if there any available, and queues a work item in the thread pool if there is one, then it sleeps (how long depends on the level of throttling). Another strength of this solution is that I can flag a queue for being mid-send, so other threads wouldn’t dequeue from it. This seems similar to the first solution, except that not every eligible quote gets queued… only the ones that should be sent. The first solution queues a work item regardless, and does the throttle checking afterward.