Given the following program I've measure the speed with combinations of WIN32/64 and Debug/Release:
TEST 1)
var
i: integer;
x, y: int64;
begin
x := 0;
y := 0;
for i := 0 to MaxLongInt -1 do
begin
Inc(x, y);
Inc(y);
end;
end;
And the results expressed in milliseconds:
| Debug | Release |
Win32 | 6.703 | 5.395 |
Win64 | 4.005 | 1.380 |
Clearly the Win64 + Release is almost four times faster than the optimized 32 bits version, there is not big difference between non optimized versions of Win32 and Win64.
TEST 2)
Another test, this time implies multiplications:
var
i: integer;
x, y: int64;
begin
x := 0;
y := 0;
for i := 0 to MaxLongInt -1 do
begin
x := x * y;
Inc(y, i);
end;
end;
And again the results:
| Debug | Release |
Win32 | 14368 | 14461 |
Win64 | 5718 | 2398 |
The optimized Win64 version is seven times faster than optimized Win32, there is no difference between optimized and non optimized versions of Win32, the conclussion is that Delphi does not know how to optimize x64 assembler when in 32 bits mode.
But it seems that they are doing a good job with the optimizations of the 64 bits, let's see if I'm able to compare with other compilers.
In Win32, all CPU register are 32 bits long (EVEN on 64 bit processor, bacause cpu is switched to legacy 32 bit mode!!!), so Delphi has to use EAX:EDX to simulate 64 bit integer operations. You use Int64 multiplications, which can't be done directly on 32 bit CPU hardware, so Delphi call _llmul function to multiply two 64 bit values. This is VERY VERY VERY time consuming. Plus, storing them into variable X (memory operation!) require storing eax and edx separately. In Win32 portion of code for line "x := x * y;" looks like
ReplyDeletePUSH x.low (lower 32 bits of x) NEXT FOUR CPU COMMANDS STORE x and y ON STACK AS _llmul INPUT PARAMETERs
PUSH x.high (highest 32 bit of x) ALL ARE THE MEMORY WRITE OPERATIONS, THEY POLUTE CPU CASHE, SO QUITE SLOW!
PUSH y.low
PUSH y.high
CALL _llmul CALL FUNCTION TO MULT TWO 64 bit INTEGERS
MOV x.low,EAX STORE THE RESULT INTO x, TWO COMMANDS
MOV x.high,EDX
In win64 all 64 bit integer operations are done on ONE 64 bit long register (like RAX), so storing the result into memory takes only one command.
The same portion of code would look like:
for x := x * y:
MOV RAX,x NOTICE: NO MEMORY WRITE OPERATIONS TO MULT TWO 64 BIT INTEGERS!!!!
IMUL RAX,y MULT RAX with y, USE CPU BUILT IN HARDWARE FOR MULT TWO 64 bit INTEGERS
MOV x, RAX STORE THE RESULT INTO x
As you can see, on Win32 multiplication of two 64 bit integers require lots of EXTRA work! It can't be done other way!!!! Like using car engine to lift a plane!..:)
Delphi optimize code very well on both platforms. The difference is because 64 bit processor can handle larger integers!
Hope this helps you a bit.
Two thumbs up for you test!!! Try changing all Int64 variables to Integer type. To prevent overflow, loop variable i should be max 2^16-1 (65.535). Run the test. I belive the diffrence should be quite less then 7 times. You might mail me the findouts.
Igor
musketir@hotmail.com
Thanks for you feedback Igor, but nothing new to me... (check other posts)
DeleteThe points is that there is NO difference between the optimized (RELEASE) and non optimized (DEBUG) version in 32 bits when using 64 bits variables, nothing else