Profiling with Digital Mars D Compiler on Windows

D programming language is a modern, natively-compiled, statically-typed system language. While being strongly influenced, and keeping a certain level of compatibility with C++, it tries to avoid many design flaws of its big brother. One of important aspects of D is that it makes simple things simple. Among these things are built-in array types, a built-in garbage collector, and a built-in profiler.

To enable profiling of your program you simply add the -profile switch to the DMD’s command line. The profiling code is injected directly into the compiled executable. After the instrumented program finishes executing, two plain text files are created: trace.log and trace.def. Running the program several times will update these files and combine the profiling data with the already existing results.

To make an explanation below more illustrative, let’s profile a simple program, hello.d:

void main()
{
  foo();
  bar();
}
void foo()
{
}
void bar()
{
  foo();
}

compiled and profiled like this:

dmd hello.d -profile
hello

trace.log

This is the main source for performance-related information. trace.log from the above hello.d should look like this:

------------------
	    1	__Dmain
_D5hello3barFZv	1	9	8
	    1	_D5hello3fooFZv
------------------
	    1	__Dmain
	    1	_D5hello3barFZv
_D5hello3fooFZv	2	2	2
------------------
__Dmain	0	25	15
	    1	_D5hello3fooFZv
	    1	_D5hello3barFZv

======== Timer Is 3579545 Ticks/Sec, Times are in Microsecs ========

  Num          Tree        Func        Per
  Calls        Time        Time        Call

      1           6           4           4     __Dmain
      1           2           2           2     _D5hello3barFZv
      2           0           0           0     _D5hello3fooFZv

There are two distinct sections in this file: a call graph, and function timings.

The call graph elements are separated by dashed lines. Each element describes a single function, its callers (fan in) and callees (fan out). Walter explained this nicely in his newsgroup post.

In the trace log above the call graph element for bar() is

	    1	__Dmain
_D5hello3barFZv	1	9	8
	    1	_D5hello3fooFZv

First, there is fan in consisting of one function, main(). Number 1 before __Dmain means that main() called the bar() once during the profiling session.

Next, there is the function this element describes. The line starts with the function name, then there are three numbers: total number of times this function is called, total time spent executing this function and the functions it calls, and total time spent executing this function excluding any functions it calls itself.

Important: the times specified here are in ticks, not any sort of seconds!

In the example above we can see that bar() were called once, spent 9 ticks running, 8 ticks in its own body and 1 tick in other functions.

After the element’s function there goes fan out. The format is the same as for fan in but the number before the function name means number of times the corresponding function was called from this element. Here we see that bar() called foo() once.

The timings section starts with a row of equal signs and is pretty self-explanatory. The section header establishes connection between ticks mentioned earlier and actual seconds. Also it states that all times from now on are specified in microseconds which are 1 000 000 per second.

Data in this section is split into 5 columns: number of times the function was called, total time spent in this function and all functions called from it, total time spent in this function only, average time consumed by this function per call, and the function’s mangled name. This table is sorted by the 3rd column so that functions with largest pure execution times are at the top. In our example, the most complex function is main() because it contains two function calls and therefore executes longest.

trace.def

This file, as extension suggests, is a skeleton of a module definition file. It allows OPTLINK to arrange functions within the executable in the order they actually call each other, reducing number of potential cache misses and long jump instructions.

This file is not readily usable however. At a minimum, you must add the EXETYPE NT at the start of the file to denote it’s for a 32-bit executable. Now we can use it to build an optimized hello:

dmd hello.d trace.def

Note for Tango users: this method does not work if your Tango installation adds tango-user-dmd.lib to the sc.ini as -L+tango-user-dmd.lib. This method of Tango installation assumes that the last group of linker options is always a list of libraries. However, when a .def file is used, another group of options is added to the linker command line so that the tango-user-dmd.lib is treated as a resource, not a library. I don’t have a perfect solution for this. You may want to resort to other methods of linking to tango-user-dmd.lib on Windows.

Afterword

Here I’ve covered basic technical aspects of profiling with DMD. But it is also important to know how to use information provided by the profiling process. In my next post I will profile Blaze, a 2D physics library, and discuss some pitfalls awaiting you on the way of profiling.

2 Responses

  1. [...] Sergey Gromov has kicked off a series Profiling with Digital Mars D Compiler on Windows. The first post explains the process, including how to make use of the files produced when profiling. [...]

  2. [...] with DMD on Windows: Getting Hands Dirty In my previous post I was talking about steps one needs to undertake and data they get when profiling an application. [...]

Leave a Reply