<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>&#60;code&#62;</title>
	<atom:link href="http://snakecoder.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://snakecoder.wordpress.com</link>
	<description>a twice-a-year blog</description>
	<lastBuildDate>Tue, 28 Jul 2009 12:12:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='snakecoder.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>&#60;code&#62;</title>
		<link>http://snakecoder.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://snakecoder.wordpress.com/osd.xml" title="&#60;code&#62;" />
	<atom:link rel='hub' href='http://snakecoder.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Language Support for Tuples in D</title>
		<link>http://snakecoder.wordpress.com/2009/07/28/language-support-for-tuples-in-d/</link>
		<comments>http://snakecoder.wordpress.com/2009/07/28/language-support-for-tuples-in-d/#comments</comments>
		<pubDate>Tue, 28 Jul 2009 01:31:01 +0000</pubDate>
		<dc:creator>Sergey Gromov</dc:creator>
				<category><![CDATA[D]]></category>
		<category><![CDATA[D Programming Language]]></category>
		<category><![CDATA[Programming Language Design]]></category>

		<guid isPermaLink="false">http://snakecoder.wordpress.com/?p=203</guid>
		<description><![CDATA[I think tuple support in D programming language can be improved.  Here are some initial thoughts on the matter.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=snakecoder.wordpress.com&amp;blog=6734488&amp;post=203&amp;subd=snakecoder&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Incomplete support for <a href="http://www.digitalmars.com/d/2.0/template.html#TemplateTupleParameter">tuples</a> is often discussed in <a href="http://www.digitalmars.com/webnews/newsgroups.php?search_txt=&amp;group=digitalmars.D">D community</a>.  I want to write a <a href="http://prowiki.org/wiki4d/wiki.cgi?LanguageDevel/DIPs">DIP</a> on them.  Here are some considerations.</p>
<p><span id="more-203"></span></p>
<h1>Terminology</h1>
<p>The tuple notion comes from <a href="http://en.wikipedia.org/wiki/Tuple">math</a> where it means a fixed sequence of related but maybe completely different entities.  In dynamic languages tuples are fixed-length lists of values of arbitrary types.  The closest analogy in c-family languages would be an anonymous struct.  In D things are somewhat more complicated though.</p>
<p>In D, tuple is strictly a compile-time construct.  It is a fixed-length list of types, compile-time expressions, or symbols, in any combination.  There are two obvious and useful tuple sub-classes: type tuples consisting only of types, and expression tuples consisting only of expressions.  These are mentioned in <a href="http://www.digitalmars.com/d/2.0/template.html#TemplateTupleParameter">documentation</a> and seem to be purely conventional.</p>
<p>But there&#8217;s more to that.  If you have a type tuple you can declare a variable of that type.  I&#8217;m not sure if this is documented anywhere.  This variable will be perfectly runtime.  It will have tuple type even though documentation asserts that tuples are not types.  You&#8217;ll be able to change value of this variable, either as a whole by assigning a compatible expression tuple or another such variable to it, or by modifying individual components by indexing them.  Still such a variable will display some tuple characteristics like flattening when passed as a function argument.</p>
<p>I will call such variables <em>tuple variables</em>, and values in them <em>tuple values</em>.  They&#8217;re distinct from tuples as such.  This is why I don&#8217;t like naming of <tt>std.typetuple.TypeTuple</tt> and <tt>std.typecons.Tuple</tt>: they&#8217;re both misnomers.  <tt>TypeTuple</tt> is actually a generic tuple constructor which allows to create <em>any</em> tuple supported by compiler.  The <tt>Tuple</tt> is an anonymous struct constructor which contains a tuple variable as an alternative means to access struct fields, but is otherwise not a tuple at all.</p>
<h1>Thoughts</h1>
<p>There are several things required so that the language feels like it supports tuples:</p>
<ul>
<li>Sugar for type tuple construction</li>
<li>Sugar for tuple value construction: tuple literals</li>
<li>Parallel assignment</li>
</ul>
<p>I agree with others in the community that reusing the comma operator for that would be perfect.  I think it&#8217;s possible:</p>
<ul>
<li>Let comma operator result be a tuple instead of an expression after the last comma</li>
<li>Allow types in place of expressions</li>
<li>Let semantic analyzer discard mixed tuple literals as invalid</li>
<li>Treat type tuples as regular types, and expression tuples as tuple value literals</li>
<li>Support C: when a tuple value is cast to a scalar type, replace tuple value with its last element</li>
</ul>
<h1>Caveats</h1>
<p>I can see two for now:</p>
<ol>
<li>Weird tuple to scalar cast</li>
<li>Tuple flattening may conflict with C compatibility:
<p><code>void foo(...);<br />
foo((a, b, c));<br />
</code></p>
<p>This code means <tt>foo(c)</tt> in C but <tt>foo(a, b, c)</tt> in this proposal</li>
</ol>
<p>Point 1 can be dealt with: either discarded, or such a conversion made illegal which I don&#8217;t think will hurt many.  But point 2 may be a real show-stopper.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/snakecoder.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/snakecoder.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/snakecoder.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/snakecoder.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/snakecoder.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/snakecoder.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/snakecoder.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/snakecoder.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/snakecoder.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/snakecoder.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/snakecoder.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/snakecoder.wordpress.com/203/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/snakecoder.wordpress.com/203/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/snakecoder.wordpress.com/203/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=snakecoder.wordpress.com&amp;blog=6734488&amp;post=203&amp;subd=snakecoder&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://snakecoder.wordpress.com/2009/07/28/language-support-for-tuples-in-d/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e2d0143487b4b6550a03ed6acfa70a71?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">SnakE</media:title>
		</media:content>
	</item>
		<item>
		<title>Profiling with DMD on Windows: Getting Hands Dirty</title>
		<link>http://snakecoder.wordpress.com/2009/03/02/profiling-with-dmd-on-windows-getting-hands-dirty/</link>
		<comments>http://snakecoder.wordpress.com/2009/03/02/profiling-with-dmd-on-windows-getting-hands-dirty/#comments</comments>
		<pubDate>Mon, 02 Mar 2009 01:59:28 +0000</pubDate>
		<dc:creator>Sergey Gromov</dc:creator>
				<category><![CDATA[D]]></category>
		<category><![CDATA[D Programming Language]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://snakecoder.wordpress.com/?p=160</guid>
		<description><![CDATA[In my previous post I was talking about steps one needs to undertake and data they get when profiling an application. Now let&#8217;s use this knowledge in practice. I will be profiling Blaze, a 2D physics engine based on Box2D. It was actually Mason&#8217;s newsgroup post which inspired me to blog about profiling. Prerequisites To [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=snakecoder.wordpress.com&amp;blog=6734488&amp;post=160&amp;subd=snakecoder&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In my <a href="http://snakecoder.wordpress.com/2009/02/26/profiling-with-dmd-on-windows/">previous post</a> I was talking about steps one needs to undertake and data they get when profiling an application.  Now let&#8217;s use this knowledge in practice.</p>
<p>I will be profiling <a href="http://www.dsource.org/projects/blaze">Blaze</a>, a 2D physics engine based on <a href="http://www.box2d.org/">Box2D</a>.  It was actually <a href="http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&amp;article_id=84819">Mason&#8217;s newsgroup post</a> which inspired me to blog about profiling.</p>
<p><span id="more-160"></span></p>
<h3>Prerequisites</h3>
<p>To repeat my experiments you will need:</p>
<ul>
<li>DMD compiler with Tango.  The simplest is to get it from the Tango <a href="http://www.dsource.org/projects/tango/wiki/DmdDownloads">official download site</a>.  The &#8220;current release (including DMD 1.033)&#8221; is what I&#8217;m actually using.  Here is a <a href="http://downloads.dsource.org/projects/tango/0.99.7/tango-0.99.7-bin-win32-dmd.1.033.zip">direct link</a> to download it.</li>
<li>Blaze source code.  In this article I&#8217;m hacking trunk revision 409 so I encourage you to get the same version.  You can do so using either a Subversion client:
<pre class="aside" style="width:auto;overflow:scroll;">svn checkout -r 409 http://svn.dsource.org/projects/blaze/trunk blaze</pre>
<p>or the Blaze <a href="http://www.dsource.org/projects/blaze/browser/trunk?rev=409">repository browser</a>, the &#8220;Download in other formats: Zip Archive&#8221; link at the bottom of the page.  Here&#8217;s a <a>direct link</a> if you wish to save some browsing.</li>
<li><a href="http://www.dsource.org/projects/derelict">Derelict</a>.  I&#8217;m using trunk revision 336, but I think any latest version will suffice.  Get it either via SVN
<pre class="aside" style="width:auto;overflow:scroll;">svn checkout http://svn.dsource.org/projects/derelict/trunk derelict</pre>
<p>or from the <a href="http://www.dsource.org/projects/derelict/browser/trunk">repository browser</a>, <a href="http://www.dsource.org/projects/derelict/changeset/337/trunk?old_path=%2F&amp;format=zip">direct link</a> included.</li>
<li><a href="http://www.libsdl.org/index.php">SDL</a>.dll.  You can get a pre-built Windows version from <a href="http://www.libsdl.org/download-1.2.php">the official downloads page</a>.</li>
</ul>
<p>That&#8217;s it.  You won&#8217;t be needing anything else.  Make sure you put DMD&#8217;s bin on the path, the rest of this article assumes that.</p>
<h3>Build</h3>
<p>Blaze comes with a testbed application and a set of <a href="http://dsource.org/projects/dsss">DSSS</a> configuration files.  I won&#8217;t use DSSS though.  I&#8217;ll use <a href="http://www.dsource.org/projects/tango/wiki/Jake">Jake</a> instead&#8212;a minimalistic, one-step build utility similar to <a href="http://www.dsource.org/projects/build">Bud</a> and <a href="http://www.dsource.org/projects/dsss/wiki/Rebuild">Rebuild</a> but more robust because it relies upon DMD&#8217;s built-in dependency resolution system.  And it&#8217;s simple.  And it&#8217;s bundled with the Tango distribution mentioned above so you should already have it.  In fact I&#8217;m using a similar tool of my own which works not only with Tango but with <a href="http://www.digitalmars.com/d/1.0/phobos/phobos.html">Phobos</a> as well, but that&#8217;s another story.  Jake is perfectly sufficient in the scope of this article.</p>
<p>I use the following command to build the testbed application:</p>
<pre class="aside" style="width:auto;overflow:scroll;">jake blazeDemos.d -O -release -inline -profile -I..\blaze -I\home\snake\src\derelict\DerelictGL -I\home\snake\src\derelict\DerelictUtil -I\home\snake\src\derelict\DerelictGLU -I\home\snake\src\derelict\DerelictSDL</pre>
<p>Of course you should change path to Derelict according to your setup.  The build takes around 12 seconds on my 1.8 GHz Core2 Duo laptop.  Don&#8217;t forget to copy the <em>sdl.dll</em> into the same directory where you build the testbed app.  Now we are ready to</p>
<h3>Profile!</h3>
<p>Run the <em>blazeDemos.exe</em>.  The app starts with the default domino demo.  It runs slowly, much slower than without profiling enabled.</p>
<p>The profiling process only measures performance of code actually executed, so it is essential to let the application run for some time.  I suggest you wait for everything stops moving except for the pendulum, then close the application.  This way many code paths are executed and therefore measured, and the results are easy to reproduce.  It takes around 30 seconds on my laptop for the scene to completely stabilize.</p>
<p>Let&#8217;s see what we&#8217;ve got.  I want generic performance information now so I skip right to the overal timings section of <em>trace.log</em> starting with a row of equal signs.  Here&#8217;s what I&#8217;ve got in several first lines:</p>
<pre class="aside" style="width:auto;overflow:scroll;">
======== Timer Is 3579545 Ticks/Sec, Times are in Microsecs ========

  Num          Tree        Func        Per
  Calls        Time        Time        Call

      1     9718996     2105879     2105879     __Dmain
    859     1767855     1100816        1281     _D5blaze9collision5nbody12sortAndSweep12SortAndSweep6searchMFZv
    859      741745      618140         719     _D10blazeDemos9drawSceneFZv
  53649      654964      571310          10     _D5blaze9collision8pairwise11collidePoly14edgeSeparationFC5blaze9collision6shapes7polygon7PolygonS5blaze6common4math6bXFormiC5blaze9collision6shapes7polygon7PolygonS5blaze6common4math6bXFormZf
   2789      628572      542171         194     _D5blaze8dynamics7contact13contactSolver13ContactSolver5_ctorMFS5blaze5world8TimeStepAC5blaze8dynamics7contact7contact7ContactiZC5blaze8dynamics7contact13contactSolver13ContactSolver
      1      361446      361442      361442     _D10blazeDemos14createGLWindowFAaiiibZv
    859      322915      279146         324     _D5blaze9collision5nbody12sortAndSweep12SortAndSweep9shellSortMFZv
  15712      914581      240098          15     _D5blaze9collision8pairwise11collidePoly17findMaxSeparationFKiC5blaze9collision6shapes7polygon7PolygonS5blaze6common4math6bXFormC5blaze9collision6shapes7polygon7PolygonS5blaze6common4math6bXFormZf
</pre>
<p>Let&#8217;s see.  First comes <code>__Dmain</code>, that is, <code>main()</code>.  It&#8217;s Tree Time is 9718996 microseconds which is almost 10 seconds.  <code>main()</code> is the application starting point, so everything else is called from it&#8212;the testbed application is single-threaded.  The <code>main</code>&#8216;s tree time must be the total program execution time.  But the measured program run time was 30 seconds.  Why&#8217;s the difference?  This is because profiling comes with huge overhead.  That&#8217;s profiling code which runs 2/3 of the time.  Without profiling the app would run 3 times faster.  This is no surprise.</p>
<p>What actually surprises is the Func Time column.  Func time is the total time spent in the body of this function but not in any functions it calls.  As I <a href="http://snakecoder.wordpress.com/2009/02/26/profiling-with-dmd-on-windows/">explained earlier</a>, this time is used to sort the function table to put most time consuming functions at the top of the list.  And the top one is <code>main</code>!  Even more, according to the measurements <code>main</code> consumes more than 20% of execution time:</p>
<p style="margin-left:4em;">2.1 / 9.72 * 100 = 21.6%</p>
<p>This is really unexpected.  <code>main()</code> contains the most outer loop, body of which executes once per frame, and it only calls a small number of other functions.  It&#8217;s easy to see the main loop activity looking at <code>main</code>&#8216;s fan out:</p>
<pre class="aside" style="width:auto;overflow:scroll;">
__Dmain	0	34789587	7538091
	    1	_D10blazeDemos14createGLWindowFAaiiibZv
	    1	_D10blazeDemos10keyPressedFiZv
	  859	_D10blazeDemos11processTimeFZv
	  859	_D5tango4text7convert7Integer13__T6decodeTaZ6decodeFAaKaJaJiZv
	  859	_D5tango4text7convert7Integer16__T9formatterTaZ9formatterFAalaaiZAa
	  859	_D5tango4stdc7stringz9toStringzFAaAaZPa
	  859	_D5blaze5world5World4stepMFfiiZv
	  859	_D7dominos7Dominos6updateMFZv
	  859	_D10blazeDemos13processEventsFZv
	  859	_D10blazeDemos9drawSceneFZv
	  859	_D10blazeDemos8limitFPSFkZv
</pre>
<p><code>main()</code> only calls 9 other functions, less than 900 times each.  This shouldn&#8217;t take more than complicated collision detection.  But it does.  Here&#8217;s why.</p>
<blockquote><p>
Profiler takes into account only functions compiled with <code style="white-space:nowrap;">-profile</code> switch.  It is blind towards any other functions.  If an instrumented function calls a non-instrumented one, the call time is added to the caller&#8217;s &#8220;func time&#8221; as if the called function were inlined.
</p></blockquote>
<p>Looking at <code>main()</code>, it is easy to come with a list of potentially offending functions:</p>
<dl>
<dt><code>glClear()</code></dt>
<dd>A direct call to OpenGL dynamic library</dd>
<dt><code>SDL_WM_SetCaption()</code></dt>
<dd>A direct call to SDL dynamic library</dd>
<dt><code>SDL_GL_SwapBuffers()</code></dt>
<dd>Again, a direct call to SDL dynamic library</dd>
</dl>
<p>Let&#8217;s wrap each into a separate D function, like <code>mainClean</code>, <code>mainCaption</code>, and <code>mainSwap</code>.  Don&#8217;t forget to disable inlining for this trick to work.</p>
<blockquote><p>
<strong>Important!</strong>  Every run of an instrumented application updates <em>trace.log</em> and <em>trace.def</em> to include new results.  But the application doesn&#8217;t check whether <em>trace.log</em> and <em>trace.def</em> are from the same application.  If they&#8217;re not, or if they&#8217;re from another build of the same application, they get corrupted.  I think there should be some sort of a check sum at the start of these files so that app can compare against its own check sum and overwrite incompatible files completely.  For now though, <strong>always erase <em>trace.log</em> and <em>trace.def</em> after re-building your test application.</strong>
</p></blockquote>
<p>The trick works.  <code>main()</code> takes 93rd place, consuming only modest 21 milliseconds.  And the first place is taken by <code>mainCaption</code>.  That&#8217;s it.  Setting window caption takes more than 1/5 of execution time of an optimized application.</p>
<p>Finding out why setting caption is so slow is probably out of scope of this article.  I&#8217;ll just assume there is a reason and hack around it.  There is a variable, <code>frames</code>, which is set to zero every time a frame rate counter, <code>fps_</code>, is refreshed.  So I&#8217;ll add a check and will only update caption when this happens.  Here&#8217;s the final code fragment:</p>
<pre class="aside" style="width:auto;overflow:scroll;">
        if (frames == 0) {
            // Create fps caption
            char[] title = "Blaze Demo - Press Keys 0-9, q-e To Switch Demos - FPS: ";
            title  ~= Integer.format (new char[32], fps_);
            void mainCaption() {
                SDL_WM_SetCaption(toStringz(title), null);
            }
            mainCaption();
        }
</pre>
<p>Rebuild, erase <em>trace.*</em>, run.  Now <code>mainCaption</code> is at 28th place with only 1/4 second consumed.  The first place is rightly taken by solveVelocityConstraints which is probably computation-intensive.</p>
<p>Final test, with inlining.  Compile, erase, run:</p>
<pre class="aside" style="width:auto;overflow:scroll;">
======== Timer Is 3579545 Ticks/Sec, Times are in Microsecs ========

  Num          Tree        Func        Per
  Calls        Time        Time        Call

    829     1691967     1065306        1285     _D5blaze9collision5nbody12sortAndSweep12SortAndSweep6searchMFZv
    829      715255      600852         724     _D10blazeDemos9drawSceneFZv
  52839      647945      565433          10     _D5blaze9collision8pairwise11collidePoly14edgeSeparationFC5blaze9collision6shapes7polygon7PolygonS5blaze6common4math6bXFormiC5blaze9collision6shapes7polygon7PolygonS5blaze6common4math6bXFormZf
   2729      619133      535539         196     _D5blaze8dynamics7contact13contactSolver13ContactSolver5_ctorMFS5blaze5world8TimeStepAC5blaze8dynamics7contact7contact7ContactiZC5blaze8dynamics7contact13contactSolver13ContactSolver
      1      345129      345126      345126     _D10blazeDemos14createGLWindowFAaiiibZv
    829      305158      264505         319     _D5blaze9collision5nbody12sortAndSweep12SortAndSweep9shellSortMFZv
  15472      902121      235004          15     _D5blaze9collision8pairwise11collidePoly17findMaxSeparationFKiC5blaze9collision6shapes7polygon7PolygonS5blaze6common4math6bXFormC5blaze9collision6shapes7polygon7PolygonS5blaze6common4math6bXFormZf
   2729      355048      223354          81     _D5blaze8dynamics7contact13contactSolver13ContactSolver24solvePositionConstraintsMFfZb
   8187      251732      218908          26     _D5blaze8dynamics7contact13contactSolver13ContactSolver24solveVelocityConstraintsMFZv
  16368      251190      217956          13     _D5blaze9collision8pairwise8distance8inPointsFS5blaze6common4math5bVec2AS5blaze6common4math5bVec2iZb
   4945      467027      170846          34     _D5blaze9collision8pairwise8distance103__T15distanceGenericTC5blaze9collision6shapes7polygon7PolygonTC5blaze9collision6shapes7polygon7PolygonZ15distanceGenericFKS5blaze6common4math5bVec2KS5blaze6common4math5bVec2C5blaze9collision6shapes7polygon7PolygonS5blaze6common4math6bXFormC5blaze9collision6shapes7polygon7PolygonS5blaze6common4math6bXFormZf
   2726     1784432      167115          61     _D5blaze8dynamics6island6Island5solveMFS5blaze5world8TimeStepS5blaze6common4math5bVec2bZv
      1     7668760      165340      165340     __Dmain
</pre>
<p><code>main()</code> is at 13th place, consuming only 165 milliseconds in total&#8212;all title updates included because of inlining.  Clean measured run time is now 7.67 seconds instead of 9.72.  That&#8217;s more than 20% speed up.  Not bad at all.</p>
<h3>Afterword</h3>
<p>You&#8217;ve probably heard of premature optimization.  It&#8217;s when you waste time optimizing something just to find out later that it didn&#8217;t matter, but you could have had a 20% speed gain by writing a simple <code>if</code> in a completely different place you couldn&#8217;t even think of.  Or when you realize that you must optimize your heavily optimized algorithm differently, essentially re-implementing it.</p>
<p>Don&#8217;t ever make this mistake.  Use profiler.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/snakecoder.wordpress.com/160/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/snakecoder.wordpress.com/160/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/snakecoder.wordpress.com/160/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/snakecoder.wordpress.com/160/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/snakecoder.wordpress.com/160/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/snakecoder.wordpress.com/160/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/snakecoder.wordpress.com/160/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/snakecoder.wordpress.com/160/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/snakecoder.wordpress.com/160/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/snakecoder.wordpress.com/160/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/snakecoder.wordpress.com/160/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/snakecoder.wordpress.com/160/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/snakecoder.wordpress.com/160/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/snakecoder.wordpress.com/160/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=snakecoder.wordpress.com&amp;blog=6734488&amp;post=160&amp;subd=snakecoder&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://snakecoder.wordpress.com/2009/03/02/profiling-with-dmd-on-windows-getting-hands-dirty/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e2d0143487b4b6550a03ed6acfa70a71?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">SnakE</media:title>
		</media:content>
	</item>
		<item>
		<title>Profiling with Digital Mars D Compiler on Windows</title>
		<link>http://snakecoder.wordpress.com/2009/02/26/profiling-with-dmd-on-windows/</link>
		<comments>http://snakecoder.wordpress.com/2009/02/26/profiling-with-dmd-on-windows/#comments</comments>
		<pubDate>Thu, 26 Feb 2009 16:36:41 +0000</pubDate>
		<dc:creator>Sergey Gromov</dc:creator>
				<category><![CDATA[D]]></category>
		<category><![CDATA[D Programming Language]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://snakecoder.wordpress.com/?p=3</guid>
		<description><![CDATA[D programming language is a modern, natively-compiled, statically-typed system language. While being strongly influenced, and keeping a certain level of compatibility with C++, it tries to avoid many design flaws of its big brother. One of important aspects of D is that it makes simple things simple. Among these things are built-in array types, a [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=snakecoder.wordpress.com&amp;blog=6734488&amp;post=3&amp;subd=snakecoder&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.digitalmars.com/d/">D programming language</a> is a modern, natively-compiled, statically-typed system language.  While being strongly influenced, and keeping a certain level of compatibility with C++, it tries to avoid many design flaws of its big brother.  One of important aspects of D is that it makes simple things simple.  Among these things are built-in array types, a built-in garbage collector, and a built-in profiler.</p>
<p><span id="more-3"></span></p>
<p>To enable profiling of your program you simply add the <kbd>-profile</kbd> switch to the DMD&#8217;s command line.  The profiling code is injected directly into the compiled executable.  After the instrumented program finishes executing, two plain text files are created: <var>trace.log</var> and <var>trace.def</var>.  Running the program several times will update these files and combine the profiling data with the already existing results.</p>
<p>To make an explanation below more illustrative, let&#8217;s profile a simple program, <var>hello.d</var>:</p>
<p><pre class="brush: cpp;">
void main()
{
  foo();
  bar();
}
void foo()
{
}
void bar()
{
  foo();
}
</pre></p>
<p>compiled and profiled like this:</p>
<pre class="aside">
dmd hello.d -profile
hello
</pre>
<h3>trace.log</h3>
<p>This is the main source for performance-related information.  <var>trace.log</var> from the above <var>hello.d</var> should look like this:</p>
<pre class="aside" style="width:auto;height:8em;overflow:scroll;">
------------------
	    1	__Dmain
_D5hello3barFZv	1	9	8
	    1	_D5hello3fooFZv
------------------
	    1	__Dmain
	    1	_D5hello3barFZv
_D5hello3fooFZv	2	2	2
------------------
__Dmain	0	25	15
	    1	_D5hello3fooFZv
	    1	_D5hello3barFZv

======== Timer Is 3579545 Ticks/Sec, Times are in Microsecs ========

  Num          Tree        Func        Per
  Calls        Time        Time        Call

      1           6           4           4     __Dmain
      1           2           2           2     _D5hello3barFZv
      2           0           0           0     _D5hello3fooFZv
</pre>
<p>There are two distinct sections in this file: a call graph, and function timings.</p>
<p>The call graph elements are separated by dashed lines.  Each element describes a single function, its callers (fan in) and callees (fan out).  Walter explained this nicely in his <a href="http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&amp;article_id=56693">newsgroup post</a>.</p>
<p>In the trace log above the call graph element for <code>bar()</code> is</p>
<pre class="aside">
	    1	__Dmain
_D5hello3barFZv	1	9	8
	    1	_D5hello3fooFZv
</pre>
<p>First, there is fan in consisting of one function, <code>main()</code>.  Number 1 before <code>__Dmain</code> means that <code>main()</code> called the <code>bar()</code> once during the profiling session.</p>
<p>Next, there is the function this element describes.  The line starts with the function name, then there are three numbers: total number of times this function is called, total time spent executing this function <em>and</em> the functions it calls, and total time spent executing this function <em>excluding</em> any functions it calls itself.</p>
<blockquote><p><strong>Important:</strong> the times specified here are in ticks, not any sort of seconds!</p></blockquote>
<p>In the example above we can see that <code>bar()</code> were called once, spent 9 ticks running, 8 ticks in its own body and 1 tick in other functions.</p>
<p>After the element&#8217;s function there goes fan out.  The format is the same as for fan in but the number before the function name means number of times the corresponding function was called from this element.  Here we see that <code>bar()</code> called <code>foo()</code> once.</p>
<p>The timings section starts with a row of equal signs and is pretty self-explanatory.  The section header establishes connection between ticks mentioned earlier and actual seconds.  Also it states that all times from now on are specified in microseconds which are 1&nbsp;000&nbsp;000 per second.</p>
<p>Data in this section is split into 5 columns: number of times the function was called, total time spent in this function and all functions called from it, total time spent in this function only, average time consumed by this function per call, and the function&#8217;s mangled name.  This table is sorted by the 3rd column so that functions with largest pure execution times are at the top.  In our example, the most complex function is <code>main()</code> because it contains two function calls and therefore executes longest.</p>
<h3>trace.def</h3>
<p>This file, as extension suggests, is a skeleton of a module definition file.  It allows <a href="http://www.digitalmars.com/ctg/optlink.html">OPTLINK</a> to arrange functions within the executable in the order they actually call each other, reducing number of potential cache misses and long jump instructions.</p>
<p>This file is not readily usable however.  At a minimum, you must add the <code>EXETYPE NT</code> at the start of the file to denote it&#8217;s for a 32-bit executable.  Now we can use it to build an optimized <var>hello</var>:</p>
<pre class="aside">dmd hello.d trace.def</pre>
<blockquote><p>
<strong>Note for Tango users:</strong> this method does not work if your Tango installation adds <var>tango-user-dmd.lib</var> to the <var>sc.ini</var> as <code>-L+tango-user-dmd.lib</code>.  This method of Tango installation assumes that the last group of linker options is always a list of libraries.  However, when a <var>.def</var> file is used, another group of options is added to the linker command line so that the <var>tango-user-dmd.lib</var> is treated as a resource, not a library.  I don&#8217;t have a perfect solution for this.  You may want to resort to <a href="http://www.dsource.org/projects/tango/wiki/WindowsInstall#ManualBuildandInstall">other methods</a> of linking to <var>tango-user-dmd.lib</var> on Windows.
</p></blockquote>
<h3>Afterword</h3>
<p>Here I&#8217;ve covered basic technical aspects of profiling with DMD.  But it is also important to know how to <em>use</em> information provided by the profiling process.  In my next post I will profile <a href="http://www.dsource.org/projects/blaze">Blaze</a>, a 2D physics library, and discuss some pitfalls awaiting you on the way of profiling.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/snakecoder.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/snakecoder.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/snakecoder.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/snakecoder.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/snakecoder.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/snakecoder.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/snakecoder.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/snakecoder.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/snakecoder.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/snakecoder.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/snakecoder.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/snakecoder.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/snakecoder.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/snakecoder.wordpress.com/3/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=snakecoder.wordpress.com&amp;blog=6734488&amp;post=3&amp;subd=snakecoder&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://snakecoder.wordpress.com/2009/02/26/profiling-with-dmd-on-windows/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e2d0143487b4b6550a03ed6acfa70a71?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">SnakE</media:title>
		</media:content>
	</item>
	</channel>
</rss>
