Thursday 3 May 2012

Have my threads and eat them.

I plan to implement this key value store as a traditional server that multiple clients connect to over TCP/IP.  I'd like be able to handle as many simultaneous connections as possible while keeping memory usage and processing to a minimum.  Creating an operating system thread for each client as they connect is the common approach to solving this problem.  Spinning up potentially thousands of these threads is a concern when considering the resources that get allocated for each one.

There's a number of arguments claiming that needing a thread for each client is a poor approach to handling a large number of concurrent client connections.  Some languages like Erlang, implement their own lightweight processes that can be used to handle individual client connections.  Which has been used as a successful approach to solve this problem.  Using asynchronous sockets also might be a better way to implement all these connections without the need of so many threads.

For me I'd like to know what the limitations are in C for the number of concurrent threads that can be started on Windows.  So I've written the simple code below to answer this question.

#include <Windows.h>
#include <stdio.h>

// Thread Procedure
DWORD WINAPI ThreadProc(__in LPVOID lpParameter)
{
    // Sleep forever

    Sleep(INFINITE);

    return(0);
}


// Main program
void main(int argc,char *argv[])
{

    int i;
    HANDLE thread = NULL;

    // Loop through creating threads
    for (i = 0; i < 1000000; i++)
    {
        DWORD id;
        HANDLE h = CreateThread(NULL, 0, ThreadProc, NULL, 0, &id);

        if (!h) break;
    }
 

    printf("Created %d threads\n", i);
    getchar();
}


Running this code tries to create 1 million threads.  In reality it only manages 2933 threads on my laptop.  That is until I modify the CreateThread call so it is like below.

CreateThread(NULL, 4096, ThreadProc, NULL, STACK_SIZE_PARAM_IS_A_RESERVATION, &id);

After making this change and reducing the allocated stack size for each thread we manage to create 12477 threads.  Which isn't too bad really considering that it's Windows 7 with only 4GB of memory on an Intel Celeron Dual-Core CPU T3100 @ 1.90GHz.

Wednesday 2 May 2012

To C or not to C.

I've been writing code for over 23 years. Earning a living as a software developer for 15 of these, after spending 3 years gaining a Computer Science degree. I've cut my teeth on a number of different programming languages: Basic, 6502 Assembly, Pascal, Miranda, Modula-3, Occam, 68k Assembly and C. Before moving onto C++, Delphi, PHP, VB6, VB.Net, C#.Net, JavaScript, Java and Erlang.  No doubt I've probably missed some.

Windows and Linux have always been the operating systems I've worked with. Though I've worked on a mixture of applications: low level device drivers, kernel modules, user applications, services, web services, websites and middleware. Even after all this experience I still come back to the fundamental question of which programming language to pick for my next personal programming project.

Guess that leads me to try and specify what it is I wish to develop. I intend to implement a horizontally scaling, distributed in memory key value store. This will be an enterprise ready application, so focusing on reliability and speed is important. It needs to run 24/7 and there can be no shutdown for maintenance, or slowdowns caused by things like garbage collection.

So I think I'm decided.  C is my choice.  It gives me the most control over this applications destiny.  If there's a problem with performance or a memory leak it will be mine.  Bad in some ways, but as these bugs are mine they will also be mine to fix.  At least using C anything can be fixed, it can't get buried in some third party library that I've no control of.