I plan to implement this key value store as a traditional server that multiple clients connect to over TCP/IP. I'd like be able to handle as many simultaneous connections as possible while keeping memory usage and processing to a minimum. Creating an operating system thread for each client as they connect is the common approach to solving this problem. Spinning up potentially thousands of these threads is a concern when considering the resources that get allocated for each one.
There's a number of arguments claiming that needing a thread for each client is a poor approach to handling a large number of concurrent client connections. Some languages like Erlang, implement their own lightweight processes that can be used to handle individual client connections. Which has been used as a successful approach to solve this problem. Using asynchronous sockets also might be a better way to implement all these connections without the need of so many threads.
For me I'd like to know what the limitations are in C for the number of concurrent threads that can be started on Windows. So I've written the simple code below to answer this question.
#include <Windows.h>
#include <stdio.h>
// Thread Procedure
DWORD WINAPI ThreadProc(__in LPVOID lpParameter)
{
// Sleep forever
Sleep(INFINITE);
return(0);
}
// Main program
void main(int argc,char *argv[])
{
int i;
HANDLE thread = NULL;
// Loop through creating threads
for (i = 0; i < 1000000; i++)
{
DWORD id;
HANDLE h = CreateThread(NULL, 0, ThreadProc, NULL, 0, &id);
if (!h) break;
}
printf("Created %d threads\n", i);
getchar();
}
Running this code tries to create 1 million threads. In reality it only manages 2933 threads on my laptop. That is until I modify the CreateThread call so it is like below.
CreateThread(NULL, 4096, ThreadProc, NULL, STACK_SIZE_PARAM_IS_A_RESERVATION, &id);
After making this change and reducing the allocated stack size for each thread we manage to create 12477 threads. Which isn't too bad really considering that it's Windows 7 with only 4GB of memory on an Intel Celeron Dual-Core CPU T3100 @ 1.90GHz.
Thursday, 3 May 2012
Wednesday, 2 May 2012
To C or not to C.
I've been writing code for over 23 years. Earning a living as a software developer for 15 of these, after spending 3 years gaining a Computer Science degree. I've cut my teeth on a number of different programming languages: Basic, 6502 Assembly, Pascal, Miranda, Modula-3, Occam, 68k Assembly and C. Before moving onto C++, Delphi, PHP, VB6, VB.Net, C#.Net, JavaScript, Java and Erlang. No doubt I've probably missed some.
Windows and Linux have always been the operating systems I've worked with. Though I've worked on a mixture of applications: low level device drivers, kernel modules, user applications, services, web services, websites and middleware. Even after all this experience I still come back to the fundamental question of which programming language to pick for my next personal programming project.
Guess that leads me to try and specify what it is I wish to develop. I intend to implement a horizontally scaling, distributed in memory key value store. This will be an enterprise ready application, so focusing on reliability and speed is important. It needs to run 24/7 and there can be no shutdown for maintenance, or slowdowns caused by things like garbage collection.
So I think I'm decided. C is my choice. It gives me the most control over this applications destiny. If there's a problem with performance or a memory leak it will be mine. Bad in some ways, but as these bugs are mine they will also be mine to fix. At least using C anything can be fixed, it can't get buried in some third party library that I've no control of.
Windows and Linux have always been the operating systems I've worked with. Though I've worked on a mixture of applications: low level device drivers, kernel modules, user applications, services, web services, websites and middleware. Even after all this experience I still come back to the fundamental question of which programming language to pick for my next personal programming project.
Guess that leads me to try and specify what it is I wish to develop. I intend to implement a horizontally scaling, distributed in memory key value store. This will be an enterprise ready application, so focusing on reliability and speed is important. It needs to run 24/7 and there can be no shutdown for maintenance, or slowdowns caused by things like garbage collection.
So I think I'm decided. C is my choice. It gives me the most control over this applications destiny. If there's a problem with performance or a memory leak it will be mine. Bad in some ways, but as these bugs are mine they will also be mine to fix. At least using C anything can be fixed, it can't get buried in some third party library that I've no control of.
Subscribe to:
Posts (Atom)