Difference between revisions of "Dhrystone MIPS 2.1"

From IGEP - ISEE Wiki

Jump to: navigation, search
(Created page with ' Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules Reinhold P. Weicker Siemens AG, E STE 35 Postfach 324…')
 
(Test Software)
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
= Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules  =
  
    Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules
+
Reinhold P. Weicker<br>Siemens AG, E STE 35<br>Postfach 3240<br>D-8520 Erlangen<br>Germany (West)
  
 +
<br>
  
                Reinhold P. Weicker
+
== <br>Why a Version 2 of Dhrystone?  ==
                Siemens AG, E STE 35
 
                Postfach 3240
 
                D-8520 Erlangen
 
                Germany (West)
 
  
 +
The Dhrystone benchmark program [1] has become a popular benchmark for CPU/compiler performance measurement, in particular in the area of minicomputers, workstations, PC's and microprocesors. It apparently satisfies a need for an easy-to-use integer benchmark; it gives a first performance indication which is more meaningful than MIPS numbers which, in their literal meaning (million instructions per second), cannot be used across different instruction sets (e.g. RISC vs. CISC). With the increasing use of the benchmark, it seems necessary to reconsider the benchmark and to check whether it can still fulfill this function. Version 2 of Dhrystone is the result of such a re-evaluation, it has been made for two reasons:
  
 +
a)&nbsp;Dhrystone has been published in Ada [1], and Versions in Ada, Pascal and C have&nbsp; been&nbsp; distributed&nbsp; by&nbsp; Reinhold Weicker via floppy disk.&nbsp; However, the version that was used most often for benchmarking has been the version&nbsp; made by&nbsp; Rick&nbsp; Richardson&nbsp; by another translation from the Ada version into the C programming language, this has been the version&nbsp; distributed&nbsp; via&nbsp; the&nbsp; UNIX network Usenet [2].
  
 +
There is an obvious need for a common C version of Dhrystone, since C is&nbsp; at present&nbsp; the&nbsp; most&nbsp; popular&nbsp; system&nbsp; programming&nbsp; language&nbsp; for the class of systems (microcomputers, minicomputers,&nbsp; workstations)&nbsp; where&nbsp; Dhrystone&nbsp; is used&nbsp; most.&nbsp;&nbsp; There&nbsp; should&nbsp; be,&nbsp; as&nbsp; far as possible, only one C version of Dhrystone such that results can be compared&nbsp; without&nbsp; restrictions.&nbsp; In&nbsp; the past,&nbsp; the&nbsp; C&nbsp; versions&nbsp; distributed by Rick Richardson (Version 1.1) and by Reinhold Weicker had small (though not significant) differences.
  
1. Why a Version 2 of Dhrystone?
+
Together with the new C version, the&nbsp; Ada&nbsp; and&nbsp; Pascal&nbsp; versions&nbsp; have&nbsp; been updated as well.  
  
The Dhrystone benchmark  program  [1]  has become  a  popular  benchmark  for
+
b)&nbsp;As far as it is possible without changes to the Dhrystone statistics,optimizing compilers should be prevented from removing&nbsp; significant statements. It has turned out in the past that optimizing compilers suppressed code generation for too many statements (by "dead code removal" or&nbsp; "dead&nbsp; variable&nbsp; elimination").&nbsp;&nbsp; This&nbsp; has&nbsp; lead&nbsp; to&nbsp; the&nbsp; danger&nbsp; that benchmarking&nbsp; results obtained by a naive application of Dhrystone - without inspection of the code that was generated - could become meaningless.
CPU/compiler  performance  measurement,  in  particular  in the area  of
 
minicomputers, workstations, PC's and microprocesors.  It apparently satisfies
 
a  need  for an  easy-to-use  integer benchmark; it gives a first performance
 
indication which is more meaningful than MIPS numbers which, in their  literal
 
meaning  (million  instructions  per  second), cannot be used across different
 
instruction sets (e.g. RISC  vs.  CISC).  With  the  increasing  use  of  the
 
benchmark, it seems necessary to reconsider the benchmark and to check whether
 
it can still fulfill this function.  Version 2 of Dhrystone is the result  of
 
such a re-evaluation, it has been made for two reasons:
 
  
o Dhrystone has been published in Ada [1], and Versions in Ada, Pascal  and  C
+
The overall policiy for version 2 has been that the distribution of statements, operand types and operand locality described in [1] should remain unchanged as much as possible. (Very few changes were necessary; their impact should be negligible.) Also, the order of statements should remain unchanged. Although I am aware of some critical remarks on the benchmark - I agree with several of them - and know some suggestions for improvement, I didn't want to change the benchmark into something different from what has become known as "Dhrystone"; the confusion generated by such a change would probably outweight the benefits. If I were to write a new benchmark program, I wouldn't give it the name "Dhrystone" since this denotes the program published in [1]. However, I do recognize the need for a larger number of representative programs that can be used as benchmarks; users should always be encouraged to use more than just one benchmark. The new versions (version 2.1 for C, Pascal and Ada) will be distributed as widely as possible. (Version 2.1 differs from version 2.0 distributed via the UNIX Network Usenet in March 1988 only in a few corrections for minor deficiencies found by users of version 2.0.) Readers who want to use the benchmark for their own measurements can obtain a copy in machine-readable form on floppy disk (MS-DOS or XENIX format) from the author.  
  have  been  distributed  by Reinhold Weicker via floppy disk. However, the
 
  version that was used most often for benchmarking has been the version  made
 
  by  Rick  Richardson  by another translation from the Ada version into the C
 
  programming language, this has been the version distributed via the UNIX
 
  network Usenet [2].
 
  
  There is an obvious need for a common C version of Dhrystone, since C is  at
+
== <br>Overall Characteristics of Version 2 ==
  present  the  most  popular  system  programming  language  for the class of
 
  systems (microcomputers, minicomputers,  workstations)  where  Dhrystone  is
 
  used  most.  There  should  be,  as  far as possible, only one C version of
 
  Dhrystone such that results can be compared  without  restrictions.  In  the
 
  past, the  C  versions  distributed by Rick Richardson (Version 1.1) and by
 
  Reinhold Weicker had small (though not significant) differences.
 
  
  Together with the new C version, the Ada  and Pascal  versions  have been
+
In general, version 2 follows - in the parts that are significant for performance measurement, i.e. within the measurement loop - the published (Ada) version and the C versions previously distributed. Where the versions distributed by Rick Richardson [2] and Reinhold Weicker have been different, it follows the version distributed by Reinhold Weicker. (However, the differences have been so small that their impact on execution time in all likelihood has been negligible.) The initialization and UNIX instrumentation part - which had been omitted in [1] - follows mostly the ideas of Rick Richardson [2]. However, any changes in the initialization part and in the printing of the result have no impact on performance measurement since they are outside the measaurement loop. As a concession to older compilers, names have been made unique within the first 8 characters for the C version.  
  updated as well.
 
  
o As far as it is possible  without  changes  to the Dhrystone statistics,
+
The original publication of Dhrystone did not contain any statements for time measurement since they are necessarily system-dependent. However, it turned out that it is not enough just to inclose the main procedure of Dhrystone in a loop and to measure the execution time. If the variables that are computed are not used somehow, there is the danger that the compiler considers them as "dead variables" and suppresses code generation for a part of the statements. Therefore in version 2 all variables of "main" are printed at the end of the program. This also permits some plausibility control for correct execution of the benchmark.  
  optimizing  compilers  should  be  prevented  from  removing  significant
 
  statements.  It has  turned  out  in  the past  that optimizing  compilers
 
  suppressed  code generation for too many statements (by "dead code removal"
 
  or  "dead  variable  elimination").   This has  lead  to  the  danger  that
 
  benchmarking  results obtained by a naive application of Dhrystone - without
 
  inspection of the code that was generated - could become meaningless.
 
  
The overall  policiy  for  version  2  has been that the distribution of
+
At several places in the benchmark, code has been added, but only in branches that are not executed. The intention is that optimizing compilers should be prevented from moving code out of the measurement loop, or from removing code altogether. Statements that are executed have been changed in very few places only. In these cases, only the role of some operands has been changed, and it was made sure that the numbers defining the "Dhrystone distribution" (distribution of statements, operand types and locality) still hold as much as possible. Except for sophisticated optimizing compilers, execution times for version 2.1 should be the same as for previous versions.  
statements, operand types and operand locality described in [1] should remain
 
unchanged as much as possible. (Very few changes were necessary; their impact
 
should be negligible.)  Also, the order of statements should remain unchanged.
 
Although I am aware of some critical remarks on the benchmark - I  agree  with
 
several  of them - and know some suggestions for improvement, I didn't want to
 
change the benchmark into something different from what has  become  known  as
 
"Dhrystone"; the confusion generated by such a change would probably outweight
 
the benefits. If I were to write a new benchmark program, I wouldn't  give  it
 
the  name  "Dhrystone"  since  this  denotes  the  program  published  in [1].
 
However, I do recognize  the need  for a  larger  number  of  representative
 
programs  that can be used as benchmarks; users should always be encouraged to
 
use more than just one benchmark.
 
  
The new versions (version 2.1 for C, Pascal and Ada) will  be distributed  as
+
Because of the self-imposed limitation that the order and distribution of the executed statements should not be changed, there are still cases where optimizing compilers may not generate code for some statements. To a certain degree, this is unavoidable for small synthetic benchmarks. Users of the benchmark are advised to check code listings whether code is generated for all statements of Dhrystone.  
widely as possible. (Version 2.1 differs from version 2.0 distributed via the
 
UNIX Network Usenet in  March  1988  only  in  a few  corrections  for minor
 
deficiencies  found  by  users  of version 2.0.)  Readers who want to use the
 
benchmark for their own measurements can obtain  a  copy  in  machine-readable
 
form on floppy disk (MS-DOS or XENIX format) from the author.
 
  
 +
Contrary to the suggestion in the published paper and its realization in the versions previously distributed, no attempt has been made to subtract the time for the measurement loop overhead. (This calculation has proven difficult to implement in a correct way, and its omission makes the program simpler.) However, since the loop check is now part of the benchmark, this does have an impact - though a very minor one - on the distribution statistics which have been updated for this version.
  
2. Overall Characteristics of Version 2
+
== <br>Discussion of Individual Changes ==
  
In general, version 2  follows  -  in  the  parts  that are  significant  for
+
In this section, all changes are described that affect the measurement loop and that are not just renamings of variables. All remarks refer to the C version; the other language versions have been updated similarly.  
performance  measurement,  i.e.  within  the measurement loop - the published
 
(Ada) version and the C versions previously distributed.  Where  the  versions
 
distributed  by  Rick Richardson [2] and Reinhold Weicker have been different,
 
it  follows  the  version  distributed  by  Reinhold  Weicker.  (However,  the
 
differences  have  been  so  small  that their impact on execution time in all
 
likelihood has been negligible.)  The initialization and UNIX  instrumentation
 
part  -  which  had  been  omitted  in  [1] - follows mostly the ideas of Rick
 
Richardson [2]. However, any changes in the initialization part  and  in  the
 
printing  of  the  result have no impact on performance measurement since they
 
are outside the measaurement loop.  As a concession to older compilers,  names
 
have been made unique within the first 8 characters for the C version.
 
  
The original publication of Dhrystone did not contain any statements for  time
+
In addition to adding the measurement loop and the printout statements, changes have been made at the following places:
measurement  since  they  are necessarily system-dependent. However, it turned
 
out that it is not enough just to inclose the main procedure of Dhrystone in a
 
loop and to  measure the execution time.  If the variables that are computed
 
are not used somehow, there is the danger that the compiler considers them  as
 
"dead  variables" and suppresses code generation for a part of the statements.
 
Therefore in version 2 all variables of "main" are printed at the end  of  the
 
program.  This also permits some plausibility control for correct execution of
 
the benchmark.
 
  
At several places in the benchmark, code has been added, but only in branches
+
*In procedure "main", three statements have been added in the non-executed "then" part of the statement
that  are  not  executed. The intention is that optimizing compilers should be
+
<pre>if (Enum_Loc == Func_1 (Ch_Index, 'C'))
prevented from moving code out of the measurement loop, or from removing  code
 
altogether.  Statements that are executed have been changed in very few places
 
only.  In these cases, only the role of some operands has been changed, and it
 
was  made  sure  that  the  numbers  defining  the "Dhrystone  distribution"
 
(distribution of statements, operand types and locality) still hold as much as
 
possible.  Except for sophisticated optimizing compilers, execution times for
 
version 2.1 should be the same as for previous versions.
 
  
Because of the self-imposed limitation that the order and distribution of  the
+
they are
executed  statements  should  not  be  changed,  there  are still cases where
 
optimizing compilers may not generate code for some statements. To  a  certain
 
degree,  this  is  unavoidable  for  small synthetic benchmarks.  Users of the
 
benchmark are advised to check code listings whether code is generated for all
 
statements of Dhrystone.
 
  
Contrary to the suggestion in the published paper and its realization  in  the
+
strcpy (Str_2_Loc, "DHRYSTONE PROGRAM, 3'RD STRING");
versions previously distributed, no attempt has been made to subtract the time
+
Int_2_Loc = Run_Index;
for the measurement loop overhead. (This calculation has proven  difficult  to
+
Int_Glob = Run_Index;</pre>
implement  in  a  correct  way, and its omission makes the program simpler.)
+
The string assignment prevents movement of the preceding assignment to Str_2_Loc (5'th statement of "main") out of the measurement loop (This probably will not happen for the C version, but it did happen with another language and compiler.) The assignment to Int_2_Loc prevents value propagation for Int_2_Loc, and the assignment to Int_Glob makes the value of Int_Glob possibly dependent from the value of Run_Index.  
However, since the loop check is now part of the benchmark, this does have  an
 
impact  -  though a very minor one - on the distribution statistics which have
 
been updated for this version.
 
  
 +
*In the three arithmetic computations at the end of the measurement loop in "main ", the role of some variables has been exchanged, to prevent the division from just cancelling out the multiplication as it was in [1]. A very smart compiler might have recognized this and suppressed code generation for the division.
 +
*For Proc_2, no code has been changed, but the values of the actual parameter have changed due to changes in "main".
 +
*In Proc_4, the second assignment has been changed from <br>
 +
<pre>Bool_Loc = Bool_Loc | Bool_Glob;
  
3.  Discussion of Individual Changes
+
to
  
In this section, all changes are described that affect  the  measurement  loop
+
Bool_Glob = Bool_Loc | Bool_Glob;</pre>
and  that  are  not  just  renamings  of variables. All remarks refer to the C
+
It now assigns a value to a global variable instead of a local variable (Bool_Loc); Bool_Loc would be a "dead variable" which is not used afterwards.  
version; the other language versions have been updated similarly.
 
  
In addition to adding  the  measurement  loop  and  the  printout  statements,
+
*In Func_1, the statement
changes have been made at the following places:
 
  
o In procedure "main", three statements have been added in  the  non-executed
+
  Ch_1_Glob = Ch_1_Loc;
  "then" part of the statement
 
  
        if (Enum_Loc == Func_1 (Ch_Index, 'C'))
+
was added in the non-executed "else" part of the "if" statement, to prevent the suppression of code generation for the assignment to Ch_1_Loc.
  
  they are
+
*In Func_2, the second character comparison statement has been changed to
  
        strcpy (Str_2_Loc, "DHRYSTONE PROGRAM, 3'RD STRING");
+
if (Ch_Loc == 'R')
        Int_2_Loc = Run_Index;
 
        Int_Glob = Run_Index;
 
  
  The string assignment prevents  movement  of  the  preceding  assignment  to
+
('R' instead of 'X') because&nbsp; a&nbsp; comparison&nbsp; with&nbsp; 'X'&nbsp; is&nbsp; implied&nbsp; in&nbsp; the preceding "if" statement.  
  Str_2_Loc  (5'th  statement  of "main") out  of the measurement loop (This
 
  probably will not happen for the C version, but it did happen  with another
 
  language  and  compiler.)  The  assignment  to  Int_2_Loc  prevents  value
 
  propagation for Int_2_Loc, and the assignment to Int_Glob makes the value of
 
  Int_Glob possibly dependent from the value of Run_Index.
 
  
o In the three arithmetic computations at the end of the measurement  loop  in
+
Also in Func_2, the statement
  "main  ",  the  role  of  some  variables has been exchanged, to prevent the
 
  division from just cancelling out the multiplication as it was  in  [1].  A
 
  very  smart  compiler  might  have  recognized  this  and  suppressed  code
 
  generation for the division.
 
  
o For Proc_2, no code has been changed, but the values of the actual parameter
+
Int_Glob = Int_Loc;
  have changed due to changes in "main".
 
  
o In Proc_4, the second assignment has been changed from
+
has been added in the non-executed part of the last "if" statement, in order to prevent Int_Loc from becoming a dead variable.
  
        Bool_Loc = Bool_Loc | Bool_Glob;
+
*In Func_3, a non-executed "else" part has been added to the "if" statement
  
  to
+
While&nbsp; the&nbsp; program&nbsp; would&nbsp; not be incorrect without this "else" part, it is considered bad programming practice if a function&nbsp; can&nbsp; be&nbsp; left&nbsp; without&nbsp; a return value.
  
        Bool_Glob = Bool_Loc | Bool_Glob;
+
To compensate for this change, the (non-executed) "else" part&nbsp; in&nbsp; the&nbsp; "if" statement of Proc_3 was removed.
  
  It now assigns a value to a global variable  instead  of local variable
+
The distribution statistics have been changed only by the addition of the measurement loop iteration (1 additional statement, 4 additional local integer operands) and by the change in Proc_4 (one operand changed from local to global). The distribution statistics in the comment headers have been updated accordingly.  
  (Bool_Loc);  Bool_Loc  would  be  a  "dead  variable"  which  is  not  used
 
  afterwards.
 
  
o In Func_1, the statement
+
== <br>String Operations  ==
  
        Ch_1_Glob = Ch_1_Loc;
+
The string operations (string assignment and string comparison) have not been changed, to keep the program consistent with the original version.
  
  was added in the non-executed "else" part of the "if" statement, to  prevent
+
There has been some concern that the string operations are over-represented in the program, and that execution time is dominated by these operations. This was true in particular when optimizing compilers removed too much code in the main part of the program, this should have been mitigated in version 2.  
  the suppression of code generation for the assignment to Ch_1_Loc.
 
  
o In Func_2, the second character comparison statement has been changed to
+
It should be noted that this is a language-dependent issue: Dhrystone was first published in Ada, and with Ada or Pascal semantics, the time spent in the string operations is, at least in all implementations known to me, considerably smaller. In Ada and Pascal, assignment and comparison of strings are operators defined in the language, and the upper bounds of the strings occuring in Dhrystone are part of the type information known at compilation time. The compilers can therefore generate efficient inline code. In C, string assignemt and comparisons are not part of the language, so the string operations must be expressed in terms of the C library functions "strcpy" and "strcmp". (ANSI C allows an implementation to use inline code for these functions.) In addition to the overhead caused by additional function calls, these functions are defined for null-terminated strings where the length of the strings is not known at compilation time; the function has to check every byte for the termination condition (the null byte).
  
        if (Ch_Loc == 'R')
+
Obviously, a C library which includes efficiently coded "strcpy" and "strcmp" functions helps to obtain good Dhrystone results. However, I don't think that this is unfair since string functions do occur quite frequently in real programs (editors, command interpreters, etc.). If the strings functions are implemented efficiently, this helps real programs as well as benchmark programs.
  
  ('R' instead of 'X') because  a  comparison  with 'X'  is  implied  in  the
+
I admit that the string comparison in Dhrystone terminates later (after scanning 20 characters) than most string comparisons in real programs. For consistency with the original benchmark, I didn't change the program despite this weakness.  
  preceding "if" statement.
 
  
  Also in Func_2, the statement
+
== <br>Intended Use of Dhrystone  ==
  
        Int_Glob = Int_Loc;
+
When Dhrystone is used, the following "ground rules" apply:
  
  has been added in the non-executed part of the last "if" statement, in order
+
*Separate compilation (Ada and C versions)
  to prevent Int_Loc from becoming a dead variable.
 
  
o In Func_3, a non-executed "else" part has been added to the "if"  statement.
+
As mentioned in [1], Dhrystone was written&nbsp; to&nbsp; reflect&nbsp; actual&nbsp; programming practice&nbsp; in&nbsp; systems&nbsp; programming.&nbsp;&nbsp; The&nbsp; division into several compilation units (5 in the Ada version, 2 in the C version)&nbsp; is&nbsp; intended,&nbsp; as&nbsp; is&nbsp; the distribution of inter-module and intra-module subprogram calls.&nbsp; Although on many systems there will be no difference in execution time&nbsp; to&nbsp; a&nbsp; Dhrystone version&nbsp; where&nbsp; all&nbsp; compilation units are merged into one file, the rule is that separate compilation should&nbsp; be&nbsp; used.&nbsp;&nbsp; The&nbsp; intention&nbsp; is&nbsp; that&nbsp; real<br>programming&nbsp; practice,&nbsp; where&nbsp; programs&nbsp; consist&nbsp; of&nbsp; several&nbsp; independently compiled units, should&nbsp; be&nbsp; reflected.&nbsp;&nbsp; This&nbsp; also&nbsp; has&nbsp; implies&nbsp; that&nbsp; the compiler,&nbsp; while&nbsp; compiling&nbsp; one&nbsp; unit,&nbsp; has no information about the use of variables, register allocation etc.&nbsp; occuring in&nbsp; other&nbsp; compilation&nbsp; units. Although&nbsp; in&nbsp; real&nbsp; life&nbsp; compilation&nbsp; units&nbsp; will&nbsp; probably&nbsp; be larger, the intention is that these effects&nbsp; of&nbsp; separate&nbsp; compilation&nbsp; are&nbsp; modeled&nbsp; in Dhrystone.  
  While  the program  would  not be incorrect without this "else" part, it is
 
  considered bad programming practice if a function  can  be left  without  a
 
  return value.
 
  
  To compensate for this change, the (non-executed) "else" part  in  the  "if"
+
A few language systems have post-linkage optimization available (e.g., final register allocation is performed after linkage).&nbsp; <br>
  statement of Proc_3 was removed.
 
  
The distribution statistics have been changed only  by  the  addition  of  the
+
This is a borderline case: Post-linkage&nbsp; optimization&nbsp; involves&nbsp; additional&nbsp; program&nbsp; preparation&nbsp; time (although&nbsp; not&nbsp; as&nbsp; much&nbsp; as&nbsp; compilation in one unit) which may prevent its<br>general use in practical programming.&nbsp; I think that&nbsp; since&nbsp; it&nbsp; defeats&nbsp; the intentions given above, it should not be used for Dhrystone.  
measurement loop iteration (1 additional statement, 4 additional local integer
 
operands) and by the change in Proc_4  (one operand  changed  from  local  to
 
global). The distribution statistics in the comment headers have been updated
 
accordingly.
 
  
 +
Unfortunately, ISO/ANSI&nbsp; Pascal&nbsp; does&nbsp; not&nbsp; contain&nbsp; language&nbsp; features&nbsp; for separate&nbsp; compilation.&nbsp;&nbsp; Although&nbsp; most&nbsp; commercial Pascal compilers provide separate compilation in some way, we cannot use it for Dhrystone since&nbsp; such a&nbsp; version&nbsp; would&nbsp; not&nbsp; be portable.&nbsp; Therefore, no attempt has been made to provide a Pascal version with several compilation units.
  
4.  String Operations
+
*No procedure merging
  
The string operations (string assignment and string comparison) have not been
+
Although Dhrystone contains some very short procedures where execution would benefit from procedure merging (inlining, macro expansion of procedures), procedure merging is not to be used. The reason is that the percentage of procedure and function calls is part of the "Dhrystone distribution" of statements contained in [1]. This restriction does not hold for the string functions of the C version since ANSI C allows an implementation to use inline code for these functions.  
changed, to keep the program consistent with the original version.
 
  
There has been some concern that the string operations are over-represented in
+
*Other optimizations are allowed, but they should be indicated
the  program, and that execution time is dominated by these operations.  This
 
was true in particular when optimizing compilers removed too much code in  the
 
main part of the program, this should have been mitigated in version 2.
 
  
It should be noted that this is a  language-dependent  issue:  Dhrystone  was
+
It is often hard to draw an exact line between "normal code generation" and "optimization" in compilers: Some compilers perform operations by default that are invoked in other compilers only when optimization is explicitly requested. Also, we cannot avoid that in benchmarking people try to achieve results that look as good as possible. Therefore, optimizations performed by compilers - other than those listed above - are not forbidden when Dhrystone execution times are measured. Dhrystone is not intended to be non-optimizable but is intended to be similarly optimizable as normal programs. For example, there are several places in Dhrystone where performance benefits from optimizations like common subexpression elimination, value propagation etc., but normal programs usually also benefit from these optimizations. Therefore, no effort was made to artificially prevent such optimizations. However, measurement reports should indicate which compiler optimization levels have been used, and reporting results with different levels of compiler optimization for the same hardware is encouraged.  
first  published  in  Ada, and with Ada or Pascal semantics, the time spent in
 
the string operations is, at  least  in all  implementations  known  to me,
 
considerably smaller. In Ada and Pascal, assignment and comparison of strings
 
are operators defined in the language, and the upper  bounds  of  the  strings
 
occuring  in  Dhrystone are part of the type information known at compilation
 
time. The compilers can therefore generate  efficient  inline  code.   In  C,
 
string  assignemt  and comparisons are not part of the language, so the string
 
operations must be expressed in terms of the C library functions "strcpy"  and
 
"strcmp".   (ANSI  C  allows  an  implementation  to use inline code for these
 
functions.)  In addition to the overhead caused by additional function  calls,
 
these  functions  are  defined for null-terminated strings where the length of
 
the strings is not known at compilation time; the function has to check  every
 
byte for the termination condition (the null byte).
 
  
Obviously, a C library which includes efficiently coded "strcpy" and  "strcmp"
+
*Default results are those without "register" declarations (C version)
functions  helps to obtain good Dhrystone results. However, I don't think that
 
this is unfair since string  functions  do  occur  quite  frequently  in  real
 
programs  (editors, command interpreters, etc.).  If the strings functions are
 
implemented efficiently,  this  helps  real  programs  as  well  as  benchmark
 
programs.
 
  
I admit that the string  comparison  in  Dhrystone  terminates  later  (after
+
When Dhrystone results are quoted without additional qualification, they should be understood as results obtained without use of the "register" attribute. Good compilers should be able to make good use of registers even without explicit register declarations ([3], p. 193).  
scanning  20  characters) than most string comparisons in real programs.  For
 
consistency with the original benchmark, I didn't change the  program  despite
 
this weakness.
 
  
 +
Of course, for experimental purposes, post-linkage optimization, procedure merging and/or compilation in one unit can be done to determine their effects. However, Dhrystone numbers obtained under these conditions should be explicitly marked as such; "normal" Dhrystone results should be understood as results obtained following the ground rules listed above.
  
5. Intended Use of Dhrystone
+
In any case, for serious performance evaluation, users are advised to ask for code listings and to check them carefully. In this way, when results for different systems are compared, the reader can get a feeling how much performance difference is due to compiler optimization and how much is due to hardware speed.  
  
When Dhrystone is used, the following "ground rules" apply:
+
== <br>Acknowledgements  ==
  
o Separate compilation (Ada and C versions)
+
The C version 2.1 of Dhrystone has been developed in cooperation with Rick Richardson (Tinton Falls, NJ), it incorporates many ideas from the "Version 1.1" distributed previously by him over the UNIX network Usenet. Through his activity with Usenet, Rick Richardson has made a very valuable contribution to the dissemination of the benchmark. I also thank Chaim Benedelac (National Semiconductor), David Ditzel (SUN), Earl Killian and John Mashey (MIPS), Alan Smith and Rafael Saavedra-Barrera (UC at Berkeley) for their help with comments on earlier versions of the benchmark.
  
  As mentioned in [1], Dhrystone was written to  reflect  actual  programming
+
== <br>Bibliography ==
  practice  in  systems  programming.  The  division into several compilation
 
  units (5 in the Ada version, 2 in the C version)  is  intended,  as  is  the
 
  distribution of inter-module and intra-module subprogram calls.  Although on
 
  many systems there will be no difference in execution time  to  a  Dhrystone
 
  version  where  all  compilation units are merged into one file, the rule is
 
  that separate compilation should  be  used.  The  intention  is  that  real
 
  programming  practice,  where  programs  consist  of  several  independently
 
  compiled units, should  be  reflected.  This  also  has  implies  that  the
 
  compiler,  while  compiling  one  unit,  has no information about the use of
 
  variables, register allocation etc.  occuring in  other  compilation  units.
 
  Although  in  real  life  compilation  units  will  probably  be larger, the
 
  intention is that these effects  of  separate  compilation  are  modeled  in
 
  Dhrystone.
 
  
  A few language systems have post-linkage optimization available (e.g., final
+
[1] Reinhold P. Weicker: Dhrystone: A Synthetic Systems Programming Benchmark. Communications of the ACM 27, 10 (Oct. 1984), 1013-1030
  register allocation is performed after linkage). This is a borderline case:
 
  Post-linkage  optimization  involves  additional  program  preparation  time
 
  (although  not  as  much  as  compilation in one unit) which may prevent its
 
  general use in practical programming.  I think that  since  it  defeats  the
 
  intentions given above, it should not be used for Dhrystone.
 
  
  Unfortunately, ISO/ANSI  Pascal  does  not  contain  language  features  for
+
[2]Rick Richardson: Dhrystone 1.1 Benchmark Summary (and Program Text) Informal Distribution via "Usenet", Last Version Known to me: Sept. 21, 1987
  separate  compilation.  Although  most  commercial Pascal compilers provide
 
  separate compilation in some way, we cannot use it for Dhrystone since  such
 
  a  version  would  not  be portable. Therefore, no attempt has been made to
 
  provide a Pascal version with several compilation units.
 
  
o No procedure merging
+
[3]Brian W. Kernighan and Dennis M. Ritchie: The C Programming Language. Prentice-Hall, Englewood Cliffs (NJ) 1978
  
  Although Dhrystone contains some very short procedures where execution would
+
<br>
  benefit  from  procedure  merging (inlining, macro expansion of procedures),
 
  procedure merging is not to be used.  The reason is that the  percentage  of
 
  procedure  and  function  calls  is  part of the "Dhrystone distribution" of
 
  statements contained in [1].  This restriction does not hold for the  string
 
  functions  of  the  C  version  since ANSI C allows an implementation to use
 
  inline code for these functions.
 
  
o Other optimizations are allowed, but they should be indicated
+
= IGEP Dhrystone 2.1 MIPS Test  =
  
  It is often hard to draw an exact line between "normal code generation" and
+
== Test Software ==
  "optimization"  in  compilers:  Some compilers perform operations by default
 
  that are invoked in other compilers only  when  optimization  is  explicitly
 
  requested.  Also, we cannot avoid that in benchmarking people try to achieve
 
  results that look as good as possible.  Therefore,  optimizations  performed
 
  by  compilers  -  other  than  those  listed  above - are not forbidden when
 
  Dhrystone execution times are measured.  Dhrystone is  not  intended  to  be
 
  non-optimizable  but  is  intended  to  be  similarly  optimizable as normal
 
  programs.  For  example,  there  are  several  places  in  Dhrystone  where
 
  performance  benefits  from  optimizations  like  common  subexpression
 
  elimination, value  propagation  etc.,  but  normal  programs  usually  also
 
  benefit  from  these  optimizations.  Therefore,  no  effort  was  made  to
 
  artificially  prevent  such  optimizations.  However,  measurement  reports
 
  should  indicate  which  compiler  optimization  levels  have been used, and
 
  reporting results with different levels of  compiler  optimization  for  the
 
  same hardware is encouraged.
 
  
o Default results are those without "register" declarations (C version)
+
You can donwload the Dhrystone 2.1 MIPS&nbsp;test from [http://downloads.isee.biz/pub/files/dhrystone-2.1.tar.gz here].
  
  When Dhrystone results are quoted  without  additional  qualification, they
+
The software it's compiled for OMAP / DM processors, inside be available 2 executables:
  should  be understood  as  results  obtained  without use of the "register"
 
  attribute. Good compilers should be able to make good use of registers  even
 
  without explicit register declarations ([3], p. 193).
 
  
Of course, for experimental  purposes,  post-linkage  optimization,  procedure
+
*gcc_dry2reg<br>
merging and/or compilation in one unit can be done to determine their effects.
 
However,  Dhrystone  numbers  obtained  under  these  conditions  should  be
 
explicitly  marked as such; "normal" Dhrystone results should be understood as
 
results obtained following the ground rules listed above.
 
  
In any case, for serious performance evaluation, users are advised to ask  for
+
<u>Tune Parameters:</u>
code  listings  and  to  check  them carefully.  In this way, when results for
 
different systems are  compared,  the  reader  can  get  a  feeling  how  much
 
performance  difference is due to compiler optimization and how much is due to
 
hardware speed.
 
  
 +
GCCOPTIM=&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -O
  
6. Acknowledgements
+
Compiler: Linaro &amp; Ubuntu
 +
<pre>$ arm-linux-gnueabi-gcc -v
 +
Using built-in specs.
 +
COLLECT_GCC=arm-linux-gnueabi-gcc
 +
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabi/4.5.2/lto-wrapper
 +
Target: arm-linux-gnueabi
 +
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.5.2-8ubuntu3' --with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr
 +
--program-suffix=-4.5 --enable-shared --enable-multiarch --with-multiarch-defaults=i386-linux-gnu --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext
 +
--enable-threads=posix --with-gxx-include-dir=/usr/arm-linux-gnueabi/include/c++/4.5.2 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes
 +
--enable-plugin --enable-gold --enable-ld=default --with-plugin-ld=ld.gold --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb
 +
--disable-werror --enable-checking=release --program-prefix=arm-linux-gnueabi- --includedir=/usr/arm-linux-gnueabi/include --build=i686-linux-gnu --host=i686-linux-gnu --target=arm-linux-gnueabi
 +
--with-headers=/usr/arm-linux-gnueabi/include --with-libs=/usr/arm-linux-gnueabi/lib
 +
Thread model: posix
 +
gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu3)</pre>
 +
*cc_dry2reg<br>
  
The C version 2.1 of Dhrystone has been developed  in  cooperation  with  Rick
+
<u>Tune Parameters:</u><br>
Richardson  (Tinton  Falls,  NJ), it incorporates many ideas from the "Version
 
1.1" distributed previously by him over the UNIX network Usenet.  Through  his
 
activity with Usenet, Rick Richardson has made a very valuable contribution to
 
the dissemination of the benchmark.  I also thank  Chaim  Benedelac  (National
 
Semiconductor),  David Ditzel (SUN), Earl Killian and John Mashey (MIPS), Alan
 
Smith and Rafael  Saavedra-Barrera  (UC  at  Berkeley)  for  their  help  with
 
comments on earlier versions of the benchmark.
 
  
 +
OPTIMIZE=&nbsp;&nbsp;&nbsp; -O4 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fno-tree-vectorize
  
7. Bibliography
+
Compiler: Linaro &amp; Ubuntu<br>
 +
<pre>$ arm-linux-gnueabi-gcc -v
 +
Using built-in specs.
 +
COLLECT_GCC=arm-linux-gnueabi-gcc
 +
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabi/4.5.2/lto-wrapper
 +
Target: arm-linux-gnueabi
 +
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.5.2-8ubuntu3' --with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr
 +
--program-suffix=-4.5 --enable-shared --enable-multiarch --with-multiarch-defaults=i386-linux-gnu --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext
 +
--enable-threads=posix --with-gxx-include-dir=/usr/arm-linux-gnueabi/include/c++/4.5.2 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes
 +
--enable-plugin --enable-gold --enable-ld=default --with-plugin-ld=ld.gold --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb
 +
--disable-werror --enable-checking=release --program-prefix=arm-linux-gnueabi- --includedir=/usr/arm-linux-gnueabi/include --build=i686-linux-gnu --host=i686-linux-gnu --target=arm-linux-gnueabi
 +
--with-headers=/usr/arm-linux-gnueabi/include --with-libs=/usr/arm-linux-gnueabi/lib
 +
Thread model: posix
 +
gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu3)</pre>
 +
Calculation References:<br>
  
[1]
+
*[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/4160.html ARM Dhrystone reference].
  Reinhold P. Weicker: Dhrystone: A Synthetic Systems Programming Benchmark.
 
  Communications of the ACM 27, 10 (Oct. 1984), 1013-1030
 
  
[2]
+
== Test Case 1: IGEPv2 Revision C- DM3730 @ 1 Ghz ==
  Rick Richardson: Dhrystone 1.1 Benchmark Summary (and Program Text)
 
  Informal Distribution via "Usenet", Last Version Known  to  me:  Sept. 21,
 
  1987
 
  
[3]
+
<u>'''Board'''</u>: IGEPv2 Revision C - RC5 - DM3730 @ 1Ghz - 512 MBytes LPDDR RAM + 512 MBytes OneNand Flash<br>
   Brian W. Kernighan and Dennis M. Ritchie: The C Programming Language.
+
 
   Prentice-Hall, Englewood Cliffs (NJ) 1978
+
<u>'''Operating System'''</u>: Linux version 2.6.35.13 (mcaro@manel-p) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu3) ) #3 Fri Jun 10 19:58:16 CEST 2011
 +
 
 +
<u>'''Boot Software:'''</u> IGEP X-Loader 2.0.1-2<br>
 +
 
 +
'''<u>a)&nbsp;Test case: Execution 10000000 loops (gcc_dry2)</u>'''<br>
 +
 
 +
<u>Result</u><br>
 +
<pre>root@localhost:/tmp# ./gcc_dry2
 +
Dhrystone Benchmark, Version 2.1 (Language: C)
 +
Program compiled without 'register' attribute
 +
Please give the number of runs through the benchmark: 10000000
 +
Execution starts, 10000000 runs through Dhrystone
 +
Execution ends
 +
Final values of the variables used in the benchmark:
 +
Int_Glob:            5
 +
        should be:  5
 +
Bool_Glob:          1
 +
        should be:  1
 +
Ch_1_Glob:          A
 +
        should be:  A
 +
Ch_2_Glob:          B
 +
        should be:  B
 +
Arr_1_Glob[8]:      7
 +
        should be:  7
 +
Arr_2_Glob[8][7]:    10000010
 +
        should be:  Number_Of_Runs + 10
 +
Ptr_Glob-&gt;
 +
  Ptr_Comp:          13295624
 +
        should be:  (implementation-dependent)
 +
  Discr:            0
 +
        should be:  0
 +
  Enum_Comp:        2
 +
        should be:  2
 +
  Int_Comp:          17
 +
        should be:  17
 +
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
 +
        should be:  DHRYSTONE PROGRAM, SOME STRING
 +
Next_Ptr_Glob-&gt;
 +
  Ptr_Comp:          13295624
 +
        should be:  (implementation-dependent), same as above
 +
  Discr:            0
 +
        should be:  0
 +
  Enum_Comp:        1
 +
        should be:  1
 +
  Int_Comp:          18
 +
        should be:  18
 +
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
 +
        should be:  DHRYSTONE PROGRAM, SOME STRING
 +
Int_1_Loc:          5
 +
        should be:  5
 +
Int_2_Loc:          13
 +
        should be:  13
 +
Int_3_Loc:          7
 +
        should be:  7
 +
Enum_Loc:            1
 +
        should be:  1
 +
Str_1_Loc:          DHRYSTONE PROGRAM, 1'ST STRING
 +
        should be:  DHRYSTONE PROGRAM, 1'ST STRING
 +
Str_2_Loc:          DHRYSTONE PROGRAM, 2'ND STRING
 +
        should be:  DHRYSTONE PROGRAM, 2'ND STRING
 +
 
 +
Microseconds for one run through Dhrystone:   0.4
 +
Dhrystones per Second:                      2788671.0
 +
</pre>
 +
'''''DMIPS: 2788671.0 /&nbsp;1757 = 1587.17'''''
 +
 
 +
'''<u>b)&nbsp;Test case: Execution 10000000 loops (cc_dry2)</u>'''<br>
 +
 
 +
<u>Result</u><br>
 +
<pre>root@localhost:/tmp# ./cc_dry2reg
 +
 
 +
Dhrystone Benchmark, Version 2.1 (Language: C)
 +
 
 +
Program compiled with 'register' attribute
 +
 
 +
Please give the number of runs through the benchmark: 10000000
 +
 
 +
Execution starts, 10000000 runs through Dhrystone
 +
Execution ends
 +
 
 +
Final values of the variables used in the benchmark:
 +
 
 +
Int_Glob:            5
 +
        should be:  5
 +
Bool_Glob:          1
 +
        should be:  1
 +
Ch_1_Glob:          A
 +
        should be:  A
 +
Ch_2_Glob:          B
 +
        should be:  B
 +
Arr_1_Glob[8]:      7
 +
        should be:  7
 +
Arr_2_Glob[8][7]:   10000010
 +
        should be:  Number_Of_Runs + 10
 +
Ptr_Glob-&gt;
 +
  Ptr_Comp:          4169736
 +
        should be:  (implementation-dependent)
 +
  Discr:            0
 +
        should be:  0
 +
  Enum_Comp:        2
 +
        should be:  2
 +
  Int_Comp:          17
 +
        should be:  17
 +
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
 +
        should be:  DHRYSTONE PROGRAM, SOME STRING
 +
Next_Ptr_Glob-&gt;
 +
  Ptr_Comp:          4169736
 +
        should be:  (implementation-dependent), same as above
 +
  Discr:            0
 +
        should be:  0
 +
  Enum_Comp:        1
 +
        should be:  1
 +
  Int_Comp:          18
 +
        should be:  18
 +
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
 +
        should be:  DHRYSTONE PROGRAM, SOME STRING
 +
Int_1_Loc:          5
 +
        should be:  5
 +
Int_2_Loc:          13
 +
        should be:  13
 +
Int_3_Loc:          7
 +
        should be:  7
 +
Enum_Loc:            1
 +
        should be:  1
 +
Str_1_Loc:          DHRYSTONE PROGRAM, 1'ST STRING
 +
        should be:  DHRYSTONE PROGRAM, 1'ST STRING
 +
Str_2_Loc:          DHRYSTONE PROGRAM, 2'ND STRING
 +
        should be:  DHRYSTONE PROGRAM, 2'ND STRING
 +
 
 +
Microseconds for one run through Dhrystone:    0.3
 +
Dhrystones per Second:                      3987539.0
 +
</pre>
 +
'''''DMIPS: 3987539.0 /&nbsp;1757 = 2269.51'''''<br>
 +
 
 +
 
 +
 
 +
 
 +
 
 +
[[Category:How_to_forge]]
 +
[[Category:Software]]
 +
[[Category:Software applications]]

Latest revision as of 16:31, 24 January 2014

Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules

Reinhold P. Weicker
Siemens AG, E STE 35
Postfach 3240
D-8520 Erlangen
Germany (West)



Why a Version 2 of Dhrystone?

The Dhrystone benchmark program [1] has become a popular benchmark for CPU/compiler performance measurement, in particular in the area of minicomputers, workstations, PC's and microprocesors. It apparently satisfies a need for an easy-to-use integer benchmark; it gives a first performance indication which is more meaningful than MIPS numbers which, in their literal meaning (million instructions per second), cannot be used across different instruction sets (e.g. RISC vs. CISC). With the increasing use of the benchmark, it seems necessary to reconsider the benchmark and to check whether it can still fulfill this function. Version 2 of Dhrystone is the result of such a re-evaluation, it has been made for two reasons:

a) Dhrystone has been published in Ada [1], and Versions in Ada, Pascal and C have  been  distributed  by  Reinhold Weicker via floppy disk.  However, the version that was used most often for benchmarking has been the version  made by  Rick  Richardson  by another translation from the Ada version into the C programming language, this has been the version  distributed  via  the  UNIX network Usenet [2].

There is an obvious need for a common C version of Dhrystone, since C is  at present  the  most  popular  system  programming  language  for the class of systems (microcomputers, minicomputers,  workstations)  where  Dhrystone  is used  most.   There  should  be,  as  far as possible, only one C version of Dhrystone such that results can be compared  without  restrictions.  In  the past,  the  C  versions  distributed by Rick Richardson (Version 1.1) and by Reinhold Weicker had small (though not significant) differences.

Together with the new C version, the  Ada  and  Pascal  versions  have  been updated as well.

b) As far as it is possible without changes to the Dhrystone statistics,optimizing compilers should be prevented from removing  significant statements. It has turned out in the past that optimizing compilers suppressed code generation for too many statements (by "dead code removal" or  "dead  variable  elimination").   This  has  lead  to  the  danger  that benchmarking  results obtained by a naive application of Dhrystone - without inspection of the code that was generated - could become meaningless.

The overall policiy for version 2 has been that the distribution of statements, operand types and operand locality described in [1] should remain unchanged as much as possible. (Very few changes were necessary; their impact should be negligible.) Also, the order of statements should remain unchanged. Although I am aware of some critical remarks on the benchmark - I agree with several of them - and know some suggestions for improvement, I didn't want to change the benchmark into something different from what has become known as "Dhrystone"; the confusion generated by such a change would probably outweight the benefits. If I were to write a new benchmark program, I wouldn't give it the name "Dhrystone" since this denotes the program published in [1]. However, I do recognize the need for a larger number of representative programs that can be used as benchmarks; users should always be encouraged to use more than just one benchmark. The new versions (version 2.1 for C, Pascal and Ada) will be distributed as widely as possible. (Version 2.1 differs from version 2.0 distributed via the UNIX Network Usenet in March 1988 only in a few corrections for minor deficiencies found by users of version 2.0.) Readers who want to use the benchmark for their own measurements can obtain a copy in machine-readable form on floppy disk (MS-DOS or XENIX format) from the author.


Overall Characteristics of Version 2

In general, version 2 follows - in the parts that are significant for performance measurement, i.e. within the measurement loop - the published (Ada) version and the C versions previously distributed. Where the versions distributed by Rick Richardson [2] and Reinhold Weicker have been different, it follows the version distributed by Reinhold Weicker. (However, the differences have been so small that their impact on execution time in all likelihood has been negligible.) The initialization and UNIX instrumentation part - which had been omitted in [1] - follows mostly the ideas of Rick Richardson [2]. However, any changes in the initialization part and in the printing of the result have no impact on performance measurement since they are outside the measaurement loop. As a concession to older compilers, names have been made unique within the first 8 characters for the C version.

The original publication of Dhrystone did not contain any statements for time measurement since they are necessarily system-dependent. However, it turned out that it is not enough just to inclose the main procedure of Dhrystone in a loop and to measure the execution time. If the variables that are computed are not used somehow, there is the danger that the compiler considers them as "dead variables" and suppresses code generation for a part of the statements. Therefore in version 2 all variables of "main" are printed at the end of the program. This also permits some plausibility control for correct execution of the benchmark.

At several places in the benchmark, code has been added, but only in branches that are not executed. The intention is that optimizing compilers should be prevented from moving code out of the measurement loop, or from removing code altogether. Statements that are executed have been changed in very few places only. In these cases, only the role of some operands has been changed, and it was made sure that the numbers defining the "Dhrystone distribution" (distribution of statements, operand types and locality) still hold as much as possible. Except for sophisticated optimizing compilers, execution times for version 2.1 should be the same as for previous versions.

Because of the self-imposed limitation that the order and distribution of the executed statements should not be changed, there are still cases where optimizing compilers may not generate code for some statements. To a certain degree, this is unavoidable for small synthetic benchmarks. Users of the benchmark are advised to check code listings whether code is generated for all statements of Dhrystone.

Contrary to the suggestion in the published paper and its realization in the versions previously distributed, no attempt has been made to subtract the time for the measurement loop overhead. (This calculation has proven difficult to implement in a correct way, and its omission makes the program simpler.) However, since the loop check is now part of the benchmark, this does have an impact - though a very minor one - on the distribution statistics which have been updated for this version.


Discussion of Individual Changes

In this section, all changes are described that affect the measurement loop and that are not just renamings of variables. All remarks refer to the C version; the other language versions have been updated similarly.

In addition to adding the measurement loop and the printout statements, changes have been made at the following places:

  • In procedure "main", three statements have been added in the non-executed "then" part of the statement
if (Enum_Loc == Func_1 (Ch_Index, 'C'))

they are

strcpy (Str_2_Loc, "DHRYSTONE PROGRAM, 3'RD STRING");
Int_2_Loc = Run_Index;
Int_Glob = Run_Index;

The string assignment prevents movement of the preceding assignment to Str_2_Loc (5'th statement of "main") out of the measurement loop (This probably will not happen for the C version, but it did happen with another language and compiler.) The assignment to Int_2_Loc prevents value propagation for Int_2_Loc, and the assignment to Int_Glob makes the value of Int_Glob possibly dependent from the value of Run_Index.

  • In the three arithmetic computations at the end of the measurement loop in "main ", the role of some variables has been exchanged, to prevent the division from just cancelling out the multiplication as it was in [1]. A very smart compiler might have recognized this and suppressed code generation for the division.
  • For Proc_2, no code has been changed, but the values of the actual parameter have changed due to changes in "main".
  • In Proc_4, the second assignment has been changed from
Bool_Loc = Bool_Loc | Bool_Glob;

to

Bool_Glob = Bool_Loc | Bool_Glob;

It now assigns a value to a global variable instead of a local variable (Bool_Loc); Bool_Loc would be a "dead variable" which is not used afterwards.

  • In Func_1, the statement
Ch_1_Glob = Ch_1_Loc;

was added in the non-executed "else" part of the "if" statement, to prevent the suppression of code generation for the assignment to Ch_1_Loc.

  • In Func_2, the second character comparison statement has been changed to
if (Ch_Loc == 'R')

('R' instead of 'X') because  a  comparison  with  'X'  is  implied  in  the preceding "if" statement.

Also in Func_2, the statement

Int_Glob = Int_Loc;

has been added in the non-executed part of the last "if" statement, in order to prevent Int_Loc from becoming a dead variable.

  • In Func_3, a non-executed "else" part has been added to the "if" statement

While  the  program  would  not be incorrect without this "else" part, it is considered bad programming practice if a function  can  be  left  without  a return value.

To compensate for this change, the (non-executed) "else" part  in  the  "if" statement of Proc_3 was removed.

The distribution statistics have been changed only by the addition of the measurement loop iteration (1 additional statement, 4 additional local integer operands) and by the change in Proc_4 (one operand changed from local to global). The distribution statistics in the comment headers have been updated accordingly.


String Operations

The string operations (string assignment and string comparison) have not been changed, to keep the program consistent with the original version.

There has been some concern that the string operations are over-represented in the program, and that execution time is dominated by these operations. This was true in particular when optimizing compilers removed too much code in the main part of the program, this should have been mitigated in version 2.

It should be noted that this is a language-dependent issue: Dhrystone was first published in Ada, and with Ada or Pascal semantics, the time spent in the string operations is, at least in all implementations known to me, considerably smaller. In Ada and Pascal, assignment and comparison of strings are operators defined in the language, and the upper bounds of the strings occuring in Dhrystone are part of the type information known at compilation time. The compilers can therefore generate efficient inline code. In C, string assignemt and comparisons are not part of the language, so the string operations must be expressed in terms of the C library functions "strcpy" and "strcmp". (ANSI C allows an implementation to use inline code for these functions.) In addition to the overhead caused by additional function calls, these functions are defined for null-terminated strings where the length of the strings is not known at compilation time; the function has to check every byte for the termination condition (the null byte).

Obviously, a C library which includes efficiently coded "strcpy" and "strcmp" functions helps to obtain good Dhrystone results. However, I don't think that this is unfair since string functions do occur quite frequently in real programs (editors, command interpreters, etc.). If the strings functions are implemented efficiently, this helps real programs as well as benchmark programs.

I admit that the string comparison in Dhrystone terminates later (after scanning 20 characters) than most string comparisons in real programs. For consistency with the original benchmark, I didn't change the program despite this weakness.


Intended Use of Dhrystone

When Dhrystone is used, the following "ground rules" apply:

  • Separate compilation (Ada and C versions)

As mentioned in [1], Dhrystone was written  to  reflect  actual  programming practice  in  systems  programming.   The  division into several compilation units (5 in the Ada version, 2 in the C version)  is  intended,  as  is  the distribution of inter-module and intra-module subprogram calls.  Although on many systems there will be no difference in execution time  to  a  Dhrystone version  where  all  compilation units are merged into one file, the rule is that separate compilation should  be  used.   The  intention  is  that  real
programming  practice,  where  programs  consist  of  several  independently compiled units, should  be  reflected.   This  also  has  implies  that  the compiler,  while  compiling  one  unit,  has no information about the use of variables, register allocation etc.  occuring in  other  compilation  units. Although  in  real  life  compilation  units  will  probably  be larger, the intention is that these effects  of  separate  compilation  are  modeled  in Dhrystone.

A few language systems have post-linkage optimization available (e.g., final register allocation is performed after linkage). 

This is a borderline case: Post-linkage  optimization  involves  additional  program  preparation  time (although  not  as  much  as  compilation in one unit) which may prevent its
general use in practical programming.  I think that  since  it  defeats  the intentions given above, it should not be used for Dhrystone.

Unfortunately, ISO/ANSI  Pascal  does  not  contain  language  features  for separate  compilation.   Although  most  commercial Pascal compilers provide separate compilation in some way, we cannot use it for Dhrystone since  such a  version  would  not  be portable.  Therefore, no attempt has been made to provide a Pascal version with several compilation units.

  • No procedure merging

Although Dhrystone contains some very short procedures where execution would benefit from procedure merging (inlining, macro expansion of procedures), procedure merging is not to be used. The reason is that the percentage of procedure and function calls is part of the "Dhrystone distribution" of statements contained in [1]. This restriction does not hold for the string functions of the C version since ANSI C allows an implementation to use inline code for these functions.

  • Other optimizations are allowed, but they should be indicated

It is often hard to draw an exact line between "normal code generation" and "optimization" in compilers: Some compilers perform operations by default that are invoked in other compilers only when optimization is explicitly requested. Also, we cannot avoid that in benchmarking people try to achieve results that look as good as possible. Therefore, optimizations performed by compilers - other than those listed above - are not forbidden when Dhrystone execution times are measured. Dhrystone is not intended to be non-optimizable but is intended to be similarly optimizable as normal programs. For example, there are several places in Dhrystone where performance benefits from optimizations like common subexpression elimination, value propagation etc., but normal programs usually also benefit from these optimizations. Therefore, no effort was made to artificially prevent such optimizations. However, measurement reports should indicate which compiler optimization levels have been used, and reporting results with different levels of compiler optimization for the same hardware is encouraged.

  • Default results are those without "register" declarations (C version)

When Dhrystone results are quoted without additional qualification, they should be understood as results obtained without use of the "register" attribute. Good compilers should be able to make good use of registers even without explicit register declarations ([3], p. 193).

Of course, for experimental purposes, post-linkage optimization, procedure merging and/or compilation in one unit can be done to determine their effects. However, Dhrystone numbers obtained under these conditions should be explicitly marked as such; "normal" Dhrystone results should be understood as results obtained following the ground rules listed above.

In any case, for serious performance evaluation, users are advised to ask for code listings and to check them carefully. In this way, when results for different systems are compared, the reader can get a feeling how much performance difference is due to compiler optimization and how much is due to hardware speed.


Acknowledgements

The C version 2.1 of Dhrystone has been developed in cooperation with Rick Richardson (Tinton Falls, NJ), it incorporates many ideas from the "Version 1.1" distributed previously by him over the UNIX network Usenet. Through his activity with Usenet, Rick Richardson has made a very valuable contribution to the dissemination of the benchmark. I also thank Chaim Benedelac (National Semiconductor), David Ditzel (SUN), Earl Killian and John Mashey (MIPS), Alan Smith and Rafael Saavedra-Barrera (UC at Berkeley) for their help with comments on earlier versions of the benchmark.


Bibliography

[1] Reinhold P. Weicker: Dhrystone: A Synthetic Systems Programming Benchmark. Communications of the ACM 27, 10 (Oct. 1984), 1013-1030

[2]Rick Richardson: Dhrystone 1.1 Benchmark Summary (and Program Text) Informal Distribution via "Usenet", Last Version Known to me: Sept. 21, 1987

[3]Brian W. Kernighan and Dennis M. Ritchie: The C Programming Language. Prentice-Hall, Englewood Cliffs (NJ) 1978


IGEP Dhrystone 2.1 MIPS Test

Test Software

You can donwload the Dhrystone 2.1 MIPS test from here.

The software it's compiled for OMAP / DM processors, inside be available 2 executables:

  • gcc_dry2reg

Tune Parameters:

GCCOPTIM=       -O

Compiler: Linaro & Ubuntu

$ arm-linux-gnueabi-gcc -v
Using built-in specs.
COLLECT_GCC=arm-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabi/4.5.2/lto-wrapper
Target: arm-linux-gnueabi
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.5.2-8ubuntu3' --with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr 
--program-suffix=-4.5 --enable-shared --enable-multiarch --with-multiarch-defaults=i386-linux-gnu --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext 
--enable-threads=posix --with-gxx-include-dir=/usr/arm-linux-gnueabi/include/c++/4.5.2 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--enable-plugin --enable-gold --enable-ld=default --with-plugin-ld=ld.gold --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb 
--disable-werror --enable-checking=release --program-prefix=arm-linux-gnueabi- --includedir=/usr/arm-linux-gnueabi/include --build=i686-linux-gnu --host=i686-linux-gnu --target=arm-linux-gnueabi 
--with-headers=/usr/arm-linux-gnueabi/include --with-libs=/usr/arm-linux-gnueabi/lib
Thread model: posix
gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu3)
  • cc_dry2reg

Tune Parameters:

OPTIMIZE=    -O4 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fno-tree-vectorize

Compiler: Linaro & Ubuntu

$ arm-linux-gnueabi-gcc -v
Using built-in specs.
COLLECT_GCC=arm-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabi/4.5.2/lto-wrapper
Target: arm-linux-gnueabi
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.5.2-8ubuntu3' --with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr 
--program-suffix=-4.5 --enable-shared --enable-multiarch --with-multiarch-defaults=i386-linux-gnu --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext 
--enable-threads=posix --with-gxx-include-dir=/usr/arm-linux-gnueabi/include/c++/4.5.2 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--enable-plugin --enable-gold --enable-ld=default --with-plugin-ld=ld.gold --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb 
--disable-werror --enable-checking=release --program-prefix=arm-linux-gnueabi- --includedir=/usr/arm-linux-gnueabi/include --build=i686-linux-gnu --host=i686-linux-gnu --target=arm-linux-gnueabi 
--with-headers=/usr/arm-linux-gnueabi/include --with-libs=/usr/arm-linux-gnueabi/lib
Thread model: posix
gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu3)

Calculation References:

Test Case 1: IGEPv2 Revision C- DM3730 @ 1 Ghz

Board: IGEPv2 Revision C - RC5 - DM3730 @ 1Ghz - 512 MBytes LPDDR RAM + 512 MBytes OneNand Flash

Operating System: Linux version 2.6.35.13 (mcaro@manel-p) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu3) ) #3 Fri Jun 10 19:58:16 CEST 2011

Boot Software: IGEP X-Loader 2.0.1-2

a) Test case: Execution 10000000 loops (gcc_dry2)

Result

root@localhost:/tmp# ./gcc_dry2
Dhrystone Benchmark, Version 2.1 (Language: C)
Program compiled without 'register' attribute
Please give the number of runs through the benchmark: 10000000
Execution starts, 10000000 runs through Dhrystone
Execution ends
Final values of the variables used in the benchmark:
Int_Glob:            5
        should be:   5
Bool_Glob:           1
        should be:   1
Ch_1_Glob:           A
        should be:   A
Ch_2_Glob:           B
        should be:   B
Arr_1_Glob[8]:       7
        should be:   7
Arr_2_Glob[8][7]:    10000010
        should be:   Number_Of_Runs + 10
Ptr_Glob->
  Ptr_Comp:          13295624
        should be:   (implementation-dependent)
  Discr:             0
        should be:   0
  Enum_Comp:         2
        should be:   2
  Int_Comp:          17
        should be:   17
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
        should be:   DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->
  Ptr_Comp:          13295624
        should be:   (implementation-dependent), same as above
  Discr:             0
        should be:   0
  Enum_Comp:         1
        should be:   1
  Int_Comp:          18
        should be:   18
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
        should be:   DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc:           5
        should be:   5
Int_2_Loc:           13
        should be:   13
Int_3_Loc:           7
        should be:   7
Enum_Loc:            1
        should be:   1
Str_1_Loc:           DHRYSTONE PROGRAM, 1'ST STRING
        should be:   DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc:           DHRYSTONE PROGRAM, 2'ND STRING
        should be:   DHRYSTONE PROGRAM, 2'ND STRING

Microseconds for one run through Dhrystone:    0.4
Dhrystones per Second:                      2788671.0

DMIPS: 2788671.0 / 1757 = 1587.17

b) Test case: Execution 10000000 loops (cc_dry2)

Result

root@localhost:/tmp# ./cc_dry2reg

Dhrystone Benchmark, Version 2.1 (Language: C)

Program compiled with 'register' attribute

Please give the number of runs through the benchmark: 10000000

Execution starts, 10000000 runs through Dhrystone
Execution ends

Final values of the variables used in the benchmark:

Int_Glob:            5
        should be:   5
Bool_Glob:           1
        should be:   1
Ch_1_Glob:           A
        should be:   A
Ch_2_Glob:           B
        should be:   B
Arr_1_Glob[8]:       7
        should be:   7
Arr_2_Glob[8][7]:    10000010
        should be:   Number_Of_Runs + 10
Ptr_Glob->
  Ptr_Comp:          4169736
        should be:   (implementation-dependent)
  Discr:             0
        should be:   0
  Enum_Comp:         2
        should be:   2
  Int_Comp:          17
        should be:   17
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
        should be:   DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->
  Ptr_Comp:          4169736
        should be:   (implementation-dependent), same as above
  Discr:             0
        should be:   0
  Enum_Comp:         1
        should be:   1
  Int_Comp:          18
        should be:   18
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
        should be:   DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc:           5
        should be:   5
Int_2_Loc:           13
        should be:   13
Int_3_Loc:           7
        should be:   7
Enum_Loc:            1
        should be:   1
Str_1_Loc:           DHRYSTONE PROGRAM, 1'ST STRING
        should be:   DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc:           DHRYSTONE PROGRAM, 2'ND STRING
        should be:   DHRYSTONE PROGRAM, 2'ND STRING

Microseconds for one run through Dhrystone:    0.3
Dhrystones per Second:                      3987539.0

DMIPS: 3987539.0 / 1757 = 2269.51