Hello world demystified

Hello all,
in the finest programming tradition, I take it that every time one took a programming course or decided to have a quick look into a programming language, writing a “Hello World” program was one of the first things done.
However, even in such a small program, do you really know what it does behind the scenes? Yiannis blogged about it here, giving a Java test case (and by the way, allow me to extend my welcome!) and Dimitris provided a C equivalent there. However, let’s do a roundup. After whacking some Perl code, I got the following results

Let’s start with the old workhorse, Perl:


################################################################################
The Perl version
################################################################################
strace -c -f -q perl hello.pl
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000029 1 36 13 open
0.00 0.000000 0 10 read
0.00 0.000000 0 1 write
0.00 0.000000 0 23 close
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 time
0.00 0.000000 0 7 7 access
0.00 0.000000 0 4 brk
0.00 0.000000 0 4 3 ioctl
0.00 0.000000 0 1 readlink
0.00 0.000000 0 2 munmap
0.00 0.000000 0 1 uname
0.00 0.000000 0 8 mprotect
0.00 0.000000 0 4 2 _llseek
0.00 0.000000 0 4 rt_sigaction
0.00 0.000000 0 1 rt_sigprocmask
0.00 0.000000 0 1 getrlimit
0.00 0.000000 0 31 mmap2
0.00 0.000000 0 3 3 stat64
0.00 0.000000 0 22 fstat64
0.00 0.000000 0 1 getuid32
0.00 0.000000 0 1 getgid32
0.00 0.000000 0 1 geteuid32
0.00 0.000000 0 1 getegid32
0.00 0.000000 0 1 fcntl64
0.00 0.000000 0 3 1 futex
0.00 0.000000 0 1 set_thread_area
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.000029 176 29 total
Hello World!


ltrace -c -f perl hello.pl
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
91.07 2.013999 9455 213 calloc
2.15 0.047513 64 733 pthread_getspecific
1.44 0.031752 64 492 memcpy
1.33 0.029410 65 446 malloc
1.08 0.023974 64 370 memset
1.05 0.023194 64 357 strlen
0.33 0.007340 63 115 free
0.31 0.006860 64 106 memmove
0.25 0.005633 1126 5 setlocale
0.19 0.004176 64 65 strrchr
0.15 0.003260 65 50 strchr
0.12 0.002586 63 41 strncmp
0.07 0.001598 61 26 __ctype_toupper_loc
0.07 0.001586 61 26 __ctype_tolower_loc
0.04 0.000961 64 15 pthread_mutex_lock
0.04 0.000951 63 15 pthread_mutex_unlock
0.04 0.000781 65 12 getenv
0.02 0.000466 66 7 realloc
0.02 0.000445 111 4 isatty
0.02 0.000431 107 4 lseek64
0.02 0.000358 119 3 read
0.02 0.000343 114 3 __xstat64
0.02 0.000332 66 5 __errno_location
0.01 0.000307 61 5 pthread_mutex_init
0.01 0.000258 64 4 __sigsetjmp
0.01 0.000243 121 2 open64
0.01 0.000229 114 2 close
0.01 0.000192 64 3 pthread_mutex_destroy
0.01 0.000155 155 1 readlink
0.01 0.000135 67 2 strxfrm
0.01 0.000132 66 2 sysconf
0.01 0.000127 63 2 localeconv
0.01 0.000118 118 1 time
0.01 0.000118 118 1 write
0.01 0.000114 114 1 fcntl
0.01 0.000111 111 1 __fxstat64
0.00 0.000110 110 1 getuid
0.00 0.000109 109 1 sigaction
0.00 0.000109 109 1 signal
0.00 0.000104 104 1 geteuid
0.00 0.000102 102 1 getegid
0.00 0.000102 102 1 getgid
0.00 0.000094 94 1 __register_atfork
0.00 0.000083 83 1 pthread_setspecific
0.00 0.000069 69 1 strcmp
0.00 0.000068 68 1 __ctype_b_loc
0.00 0.000068 68 1 srand48_r
0.00 0.000068 68 1 getpid
0.00 0.000068 68 1 pthread_key_create
0.00 0.000067 67 1 memchr
0.00 0.000066 66 1 drand48_r
0.00 0.000065 65 1 pthread_key_delete
0.00 0.000064 64 1 nl_langinfo
------ ----------- ----------- --------- --------------------
100.00 2.211604 3157 total
Hello World!

Moving up a bit, it is time for some Python statistics:


################################################################################
The Python version
################################################################################
strace -c -f -q python hello.py
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
56.25 0.000027 0 177 118 open
43.75 0.000021 0 88 fstat64
0.00 0.000000 0 89 read
0.00 0.000000 0 1 write
0.00 0.000000 0 59 close
0.00 0.000000 0 1 execve
0.00 0.000000 0 10 10 access
0.00 0.000000 0 6 brk
0.00 0.000000 0 5 3 ioctl
0.00 0.000000 0 3 2 readlink
0.00 0.000000 0 29 munmap
0.00 0.000000 0 1 uname
0.00 0.000000 0 11 mprotect
0.00 0.000000 0 3 _llseek
0.00 0.000000 0 68 rt_sigaction
0.00 0.000000 0 1 rt_sigprocmask
0.00 0.000000 0 1 getcwd
0.00 0.000000 0 1 getrlimit
0.00 0.000000 0 57 mmap2
0.00 0.000000 0 100 52 stat64
0.00 0.000000 0 1 lstat64
0.00 0.000000 0 1 getuid32
0.00 0.000000 0 1 getgid32
0.00 0.000000 0 1 geteuid32
0.00 0.000000 0 1 getegid32
0.00 0.000000 0 4 getdents64
0.00 0.000000 0 1 fcntl64
0.00 0.000000 0 3 1 futex
0.00 0.000000 0 1 set_thread_area
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.000048 727 186 total
Hello world!


ltrace -c -f python hello.py
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
35.03 2.118504 1153 1837 free
34.89 2.109835 1226 1720 strcmp
10.84 0.655259 63 10259 memcpy
6.76 0.408768 64 6312 strlen
3.75 0.226680 63 3556 __ctype_b_loc
2.30 0.139353 65 2130 memset
1.75 0.105637 65 1625 malloc
1.07 0.064819 16204 4 qsort
0.84 0.050971 65 784 realloc
0.43 0.025782 65 396 strchr
0.32 0.019097 119 160 fopen64
0.24 0.014268 62 227 strcpy
0.21 0.012702 64 198 strncpy
0.20 0.011854 81 145 _IO_getc
0.19 0.011645 66 176 readdir64
0.19 0.011473 114 100 __xstat64
0.12 0.007396 108 68 sigaction
0.12 0.007067 62 113 pthread_self
0.11 0.006385 145 44 fclose
0.09 0.005559 173 32 fread
0.08 0.005079 108 47 __fxstat64
0.05 0.003305 63 52 fileno
0.05 0.003187 63 50 __strcpy_chk
0.05 0.002915 63 46 strrchr
0.04 0.002119 64 33 __errno_location
0.03 0.001530 63 24 memchr
0.03 0.001526 63 24 __rawmemchr
0.02 0.001389 463 3 setlocale
0.02 0.001136 63 18 __memcpy_chk
0.02 0.001092 64 17 sem_post
0.01 0.000880 62 14 memmove
0.01 0.000796 66 12 getenv
0.01 0.000723 65 11 strcat
0.01 0.000715 71 10 __vsnprintf_chk
0.01 0.000651 65 10 sem_trywait
0.01 0.000585 65 9 ferror
0.01 0.000549 109 5 isatty
0.01 0.000456 65 7 sem_wait
0.01 0.000347 86 4 fflush
0.01 0.000344 114 3 readlink
0.01 0.000336 67 5 __strncpy_chk
0.01 0.000329 164 2 opendir
0.01 0.000320 64 5 __ctype_tolower_loc
0.00 0.000295 73 4 strerror
0.00 0.000265 66 4 sigemptyset
0.00 0.000262 131 2 ftell
0.00 0.000245 245 1 fseek
0.00 0.000226 113 2 closedir
0.00 0.000199 66 3 sem_init
0.00 0.000199 66 3 strncat
0.00 0.000196 65 3 flockfile
0.00 0.000187 62 3 funlockfile
0.00 0.000182 182 1 __realpath_chk
0.00 0.000165 165 1 fwrite
0.00 0.000148 74 2 __strdup
0.00 0.000133 133 1 geteuid
0.00 0.000116 116 1 getuid
0.00 0.000115 115 1 getegid
0.00 0.000115 115 1 getgid
0.00 0.000113 113 1 rewind
0.00 0.000111 111 1 __uflow
0.00 0.000080 80 1 sysconf
0.00 0.000075 75 1 getpid
0.00 0.000074 74 1 __libc_current_sigrtmin
0.00 0.000069 69 1 __libc_current_sigrtmax
0.00 0.000069 69 1 clearerr
0.00 0.000069 69 1 __strcat_chk
0.00 0.000068 68 1 fputs
0.00 0.000068 68 1 ungetc
0.00 0.000068 68 1 nl_langinfo
------ ----------- ----------- --------- --------------------
100.00 6.047275 30341 total
Hello world!

Let see some Ruby (see if you can spot something peculiar!)


################################################################################
The Ruby version
################################################################################
strace -c -f -q ruby hello.rb
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000020 2 11 fstat64
0.00 0.000000 0 10 read
0.00 0.000000 0 1 write
0.00 0.000000 0 11 open
0.00 0.000000 0 11 close
0.00 0.000000 0 1 execve
0.00 0.000000 0 9 9 access
0.00 0.000000 0 4 brk
0.00 0.000000 0 1 gettimeofday
0.00 0.000000 0 2 munmap
0.00 0.000000 0 1 uname
0.00 0.000000 0 11 mprotect
0.00 0.000000 0 5 sigprocmask
0.00 0.000000 0 14 rt_sigaction
0.00 0.000000 0 3 rt_sigprocmask
0.00 0.000000 0 2 getrlimit
0.00 0.000000 0 26 mmap2
0.00 0.000000 0 1 getuid32
0.00 0.000000 0 1 getgid32
0.00 0.000000 0 2 geteuid32
0.00 0.000000 0 2 getegid32
0.00 0.000000 0 2 1 futex
0.00 0.000000 0 1 set_thread_area
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.000020 134 10 total
Hello World!


ltrace -c -f ruby hello.rb
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
90.19 0.008306 8306 1 ruby_init
8.44 0.000777 777 1 ruby_options
1.37 0.000126 126 1 ruby_init_stack
------ ----------- ----------- --------- --------------------
100.00 0.009209 3 total
Hello World!

Java, which already has been covered:


################################################################################
The Java version
################################################################################
strace -c -f -q java Hello
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.92 0.490957 3836 128 16 futex
0.04 0.000193 0 9718 gettimeofday
0.01 0.000056 0 206 149 open
0.01 0.000041 3 13 clone
0.01 0.000040 0 151 111 stat64
0.01 0.000029 0 768 read
0.01 0.000029 1 23 munmap
0.00 0.000023 0 734 _llseek
0.00 0.000000 0 2 write
0.00 0.000000 0 56 close
0.00 0.000000 0 1 unlink
0.00 0.000000 0 2 execve
0.00 0.000000 0 20 18 access
0.00 0.000000 0 1 1 mkdir
0.00 0.000000 0 13 brk
0.00 0.000000 0 1 setrlimit
0.00 0.000000 0 4 readlink
0.00 0.000000 0 1 ftruncate
0.00 0.000000 0 4 uname
0.00 0.000000 0 47 mprotect
0.00 0.000000 0 2 getdents
0.00 0.000000 0 2 sched_yield
0.00 0.000000 0 27 rt_sigaction
0.00 0.000000 0 48 rt_sigprocmask
0.00 0.000000 0 1 getcwd
0.00 0.000000 0 4 getrlimit
0.00 0.000000 0 126 mmap2
0.00 0.000000 0 58 1 lstat64
0.00 0.000000 0 51 fstat64
0.00 0.000000 0 3 getuid32
0.00 0.000000 0 2 getgid32
0.00 0.000000 0 3 geteuid32
0.00 0.000000 0 2 getegid32
0.00 0.000000 0 4 madvise
0.00 0.000000 0 8 getdents64
0.00 0.000000 0 15 fcntl64
0.00 0.000000 0 14 gettid
0.00 0.000000 0 88 sched_getaffinity
0.00 0.000000 0 2 set_thread_area
0.00 0.000000 0 2 set_tid_address
0.00 0.000000 0 19 clock_gettime
0.00 0.000000 0 1 clock_getres
0.00 0.000000 0 15 set_robust_list
0.00 0.000000 0 2 socket
0.00 0.000000 0 2 2 connect
------ ----------- ----------- --------- --------- ----------------
100.00 0.491368 12394 298 total
Hello World!


ltrace -c -f java Hello
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
71.44 2.000665 181878 11 strncmp
27.65 0.774271 774271 1 pthread_join
0.20 0.005578 68 82 fgets
0.13 0.003744 3744 1 dlopen
0.06 0.001678 279 6 sysconf
0.06 0.001583 63 25 strlen
0.05 0.001525 89 17 JLI_MemAlloc
0.05 0.001490 62 24 JLI_StringDup
0.05 0.001478 61 24 strcspn
0.04 0.001228 61 20 strspn
0.04 0.001075 63 17 JLI_MemFree
0.03 0.000706 64 11 getenv
0.02 0.000579 72 8 __sprintf_chk
0.02 0.000493 61 8 strrchr
0.02 0.000456 76 6 __strcat_chk
0.01 0.000370 370 1 pthread_create
0.01 0.000347 173 2 fclose
0.01 0.000321 64 5 strchr
0.01 0.000263 131 2 readlink
0.01 0.000263 131 2 fopen
0.01 0.000240 120 2 __xstat
0.01 0.000234 117 2 access
0.01 0.000219 109 2 getgid
0.01 0.000213 106 2 getuid
0.01 0.000212 106 2 getegid
0.01 0.000209 104 2 geteuid
0.01 0.000148 74 2 dlsym
0.01 0.000147 73 2 strncpy
0.00 0.000126 63 2 JLI_FreeManifest
0.00 0.000125 62 2 __strcpy_chk
0.00 0.000124 62 2 fflush
0.00 0.000074 74 1 pthread_attr_destroy
0.00 0.000074 74 1 JLI_WildcardExpandClasspath
0.00 0.000068 68 1 putenv
0.00 0.000068 68 1 pthread_attr_init
0.00 0.000067 67 1 strcat
0.00 0.000065 65 1 pthread_attr_setdetachstate
0.00 0.000065 65 1 getpid
0.00 0.000064 64 1 pthread_attr_setstacksize
------ ----------- ----------- --------- --------------------
100.00 2.800655 303 total
Hello World!

C (can you spot the gcc switch I used btw?)


################################################################################
The C version
################################################################################
strace -c -f -q ./helloC
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
-nan 0.000000 0 1 read
-nan 0.000000 0 1 write
-nan 0.000000 0 2 open
-nan 0.000000 0 2 close
-nan 0.000000 0 1 execve
-nan 0.000000 0 3 3 access
-nan 0.000000 0 1 brk
-nan 0.000000 0 1 munmap
-nan 0.000000 0 4 mprotect
-nan 0.000000 0 7 mmap2
-nan 0.000000 0 3 fstat64
-nan 0.000000 0 1 set_thread_area
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 27 3 total
Hello World!


ltrace -c -f ./helloC
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
100.00 0.000426 426 1 __printf_chk
------ ----------- ----------- --------- --------------------
100.00 0.000426 1 total
Hello World!

Assembly

################################################################################
The assembly version
################################################################################
strace -c -f -q ./helloAsm
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
-nan 0.000000 0 1 write
-nan 0.000000 0 1 execve
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 2 total
Hello World!


ltrace -c -f ./helloAsm
ltrace: Couldn't find .dynsym or .dynstr in "./helloAsm"
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
------ ----------- ----------- --------- --------------------
100.00 0.000000 0 total

Obviously, this is not a comparative post, all the languages mentioned have their own design goals and not all implementations are equivalent.

CPython threading vs multiprocessing – the 5 minute introduction

MJC brought to my attention the following piece of code which attempts to provide some empirical data comparing the use of threads vs processes in CPython.

This code, which I understand that it is stored for historical purposes, has some bugs, which I have pointed out in the comment section, but this is not the point of this article (something like this would be not constructive). I believe that a far better contribution would be to write a short piece attempting to point out a few “gotchas”.

Bit rot took hold and the post above has disappeared but the rest of the article still applies …

For the rest of this small article, I assume that you are familiar with how threads work in CPython, especially the GIL “issue”. If you are not then, in layman’s terms, the GIL lock is one big lock that ensures that only one thread is executing per Python interpreter, in order to facilitate more efficient garbage collection. It should be clarified that the GIL lock applies to CPython and not Jython, which manages to “map” python threads to “true-parallelizable-per-core-as-they-were-used-to-be” threads. Jython comes with a performance penalty though so YMMV. (NB: Ruby 1.8.x suffers from the same “issue” so no smug Ruby fans need apply 🙂 ) To sum it up, in CPython, only one thread is allowed to use the python interpreter at any given time. “What” I can hear you lamenting “this will kill my threading performance”. However, keep on reading, all is not lost.

First of all, the GIL ensures that only one thread is executing Python bytecode. However, obviously, this is a performance killer in CPU intensive applications, if you are solving an I/O bound problem then the effect is reduced. If you are using CPython to interface with C code (something that CPython does implicitly for certain library calls), once you have reached C territory you are GIL-free. In order to overcome this limitation, the multiprocessing module was introduced, a module that shares the API with threading module. Multiprocessing attempts a workaround of the GIL limitation by spawning a new python interpreter in a separate process. In a multicore system, this allows for parallelism, much better than threads. However, someone might assume at first that “wow, since a design goal of the API was to be nearly identical, now I can do something along the lines of :s/threading.Thread/multiprocessing.Process/g and I am home free and reap all the performance benefits”.

Not so fast, junior.

Access in global variables is the same when using threading, however if you are using multiprocessing you have to specify explicitly what is shared between the various python interpreters, which adds a bit of code complexity . A nice security side-effect of using multiprocessing, as opposed to threads, is that by controlling what is really shared between processes you get a poor-man’s sandbox for free. (I will update this blog post with some practical examples a bit later)

One might be able to get a few small speedups and proper process control by using a process pool (e.g. you definitely do not want your program to start creating processes uncontrollably, you might hit OS limits or worse have a HIDS detect a privilege abuse or something similar). Given the fact that creating a new process is a somewhat costly operation, in the simplest form, you just create a bunch of them upfront, have them “sleeping” and assign tasks as needed. Given that there are a lot of thread pool implementations out there and that the multiprocessing API closely matches, one should be able to code something like that in a small amount of time. A nice trick that I have seen in practice is fire up X processes, where X is the number of cores and then assign a thread pool to each of them.

Closing, in a multicore system, when solving CPU intensive problems, you are likely to get speed improvements by using multiprocessing as opposed to threads. As with all forms of parallel programming, subtle bugs might be introduced. I hope that my post gave a nice starting point for some “gotchas”