The problem
The TClientDataSet certainly isn’t known for its speed. Its slowness is directly related to the number of records it is holding. It gets slower as more records are entered. It’s never had much attention from borland/code gear/embarcadero and the classic response I see on QC is “why are you using it with so many records? it wasnt designed for that… you have designed your application badly.. it’s not our fault…!!”
Andreas Hausladen (see Reason for Midas slow down and MidasSpeedFix.pas download) certainly proved that it is possible to use midas with lots of records without having an exponential slow down. Now if you want to cache 600mb of records in memory you can.
Andy did some great work here particularly without even having access to the source code. I am very glad to see Embarcadero taking notice and applying his fixes to the latest Midas (as of D2009 Update 3 I believe). I am even more happy that they decided to release the source code to Midas alongside D2010. Now we can all help speed up what is such an important part of the database framework and relied upon by so many.
For a while now I have used Midas with Andy’s patch and been very thankful for the huge improvement in speed. I have only got one last “issue” with midas which is that its String type fields use a fixed amount of memory regardless of the value stored in it…
If you define a string field of 1000 characters and put a 10 character value into it, it still uses 1000 bytes of memory. The alternative is to use a memo type field but unfortunately Andy’s patch does not cover blobs and they still allocate memory using the “midas memory manager” as he puts it. Given that the source to midas is now available it is trivial to apply the techniques of MidasSpeedFix to this area also hence making it possible to use Memo type fields in place of string types. Now you can have great speed and use only the necessary amount of memory at the same time.
The problem with using Blobs in Midas is that even the D2010 version (without MidasSpeedFix as it doesnt need it) or earlier versions (with MidasSpeedFix) still handle allocation of memory for blobs the hard way. This causes lots of page thrashing particularly with large/multiple datasets, so much so that Midas becomes unusable.
Should the application be redesigned?
As an aside, I have always wondered why people think large datasets are bad design. If I have 2GB of memory to work with in my app, and the demands placed upon it are high, then I should use all 2GB memory. The application design should not be limited by poor memory allocation routines. While I agree that one dataset using 1GB ram is excessive, what if my application needs to work with 8 * 125MB datasets simultaneously each running in a seperate thread in an 8 core machine?
If I did this with existing Midas the page thrashing is so bad my application will crawl. I shouldn’t need to limit the threads to make this work. Thankfully most of the soft page faults in midas can be fixed – you are still going to get some because the process of allocating memory in the first place is going to cause a soft page fault.
The resolution
When browsing the midas code ($(bds)\source\db\midas\) the first thing I noticed is that blob fields allocate memory within a Grow function (dsblob.cpp) that still “does it the hard way” as Andy has talked about in his post. This Grow function is different to the GrowDS function (ds.cpp) that was previously resolved with the patch (I say previously because you will notice that in the released midas code it is resolved and uses ReAllocMemory so does not need the patch any longer).
The new GrowDS code can be applied to the Grow function also and make memory allocation for blobs a lot smarter. Simply replace the below function in dsblob.cpp and recompile Midas. (WQordpress has butchered the display but it can still be copied and pasted with success. When i host my own wordpress i will fix my template…)
The code
DBIResult DSBLOBS::Grow(INT32 iEntries) // Grow size
{
/***************************************************************************************************/
/* */
/* Unoffical speed fix for Midas blob handling (Delphi/C++Builder 6 to 2010) */
/* Version 1 (2010-11-10) */
/* */
/* The contents of this file are subject to the Mozilla Public License Version 1.1 (the “License”);*/
/* you may not use this file except in compliance with the License. You may obtain a copy of the */
/* License at http://www.mozilla.org/MPL/ */
/* */
/* Software distributed under the License is distributed on an “AS IS” basis, WITHOUT WARRANTY OF */
/* ANY KIND, either express or implied. See the License for the specific language governing rights */
/* and limitations under the License. */
/* */
/* The Original Code is MidasBlobSpeedFix.txt */
/* */
/* The Initial Developer of the Original Code is Cameron Hart (777@flouen.com) */
/* Portions created by Cameron Hart are Copyright (C) 2009 Cameron Hart. */
/* All other code is copyright its respective owner */
/* All Rights Reserved. */
/* */
/***************************************************************************************************/
/* */
/* Now that Midas code is released I have applied the principle of GrowDS to allow FastMM4 to do */
/* its thing. I believe GrowDS was recently changed as of D2009 Update 3 to use the great work of */
/* Andreas Hausladen (MidasSpeedFix.pas). Therefore I have just taken Andys work one step further */
/* to apply it to blobs */
/* */
/* ** The main benefit is that you can now use memo type fields in place of string fields. */
/* ** Since memos are allocated dynamically they only take up the RAM neccessary compared to */
/* ** string fields which take up the defined length regardless of size of string stored in them. */
/* ** For example the string “Hello World” stored in a ftString field of 100 length will take 100 */
/* ** bytes of memory, versus 15 bytes in ftMemo field (11 bytes for text + 4 for blob pointer) */
/* */
/* Tests run on Toshiba Tecra P5 Laptop (Core 2 Duo T9300 CPU, 3GB RAM, 5400RPM 2.5 HDD, Win XP) */
/* Test was for time taken / page faults occurred to process a data set containing 1 memo field. */
/* A 778 byte string value was written to the memo field */
/* */
/* Using original allocation method to append */
/* 100,000 records = <4 sec and 570k soft page faults */
/* 200,000 records = 12 secs and 2.4mil soft page faults */
/* 500,000 records = 1 min 56 secs and 15.3mil soft page faults */
/* 1,000,000 records = your dreaming – page thrashing bad at this point with hard faults too */
/* */
/* Using new allocation method to append */
/* 100,000 records = <1 sec and 24k soft page faults */
/* 200,000 records = <2 secs and 47k soft page faults */
/* 500,000 records = <4 secs and 133k soft page faults */
/* 1,000,000 records = <9 secs and 252k soft page faults (745 MB RAM) */
/* */
/* Notes: */
/* 1. By “original allocation” I mean a build of the original midas code provided with D2010. */
/* 2. All timing etc is estimate. The difference is so blatant I didn’t do any exact profiling */
/* 3. Obviously total memory requirement is not affected by the allocation method. to give an */
/* idea on memory usage it is taking about 745MB for 1mil records with new and original method */
/* 4. Yes it is a dirty hack treating the array like this */
/* 5. Dont ask me why you would want this many records in a TClientDataSet. Sometimes you just do. */
/* */
/***************************************************************************************************//*** NEW CODE STARTS HERE ***/
pBYTE pTmp;
UINT32 iMemSize, iNewSize, iNewMemSize;iMemSize = sizeof(BLOBEntry) * iSize; //this is size in memory of the array
iNewSize = iSize + iEntries;
iNewMemSize = sizeof(BLOBEntry) * iNewSize; //this is the new requied size in memory of the arrayif (pBlobEntries)
pTmp = (pBYTE)pBlobEntries;
else
pTmp = (pBYTE)new BLOBEntry[iNewSize];
pTmp = (pBYTE)DsRealloc((pVOID)pTmp, iMemSize, iNewMemSize);
if (pTmp == NULL)
return DBIERR_NOMEMORY;
pBlobEntries = (BLOBEntry*)pTmp;iSize += iEntries;
return DBIERR_NONE;
/*** NEW CODE ENDS HERE ***//*** ORIGINAL CODE STARTS HERE (COMMENTED OUT) ***/
/*
pBYTE pTmp = (pBYTE)new BLOBEntry[iSize + iEntries];if (!pTmp)
return DBIERR_NOMEMORY;ZeroMem(pTmp, sizeof(BLOBEntry)*(iSize+iEntries));
if (pBlobEntries)
{
CopyMem(pTmp, pBlobEntries, sizeof(BLOBEntry)*iSize);
delete pBlobEntries;
}iSize += iEntries;
return DBIERR_NONE;
*/
/*** ORIGINAL CODE ENDS HERE ***/
}
The workings
How does it work? Im a Delphi developer and I had to fumble my way through the midas code so I will explain what I did in the hope that some real C++ programmers can point out any problems if any…
The GrowDS function was doing it the hard way. When using Blobs it would only allocate room for 16 new blob pointers, and when this was filled it would allocate room for another 16 new blobs. It did this allocation the hard way by allocating memory for an entirely new Array of BLOBEntry (records), zeroing the new memory, and copying the contents of the previous array into the new one (leaving 16 new empty slots), and then releasing the old memory. Even though the array contains only pointers and not the actual blob data, it is still a very slow process when you start getting over 50,000 records. For every 16 new records you create it has to reallocate this array.
Since it is an array of BLOBEntry, and BLOBEntry is a record containing only an 32 bit integer and 32 bit pchar pointer, it is represented contiguously in memory and has a fixed known size per record. It lends itself to being reallocated using ReallocMemory rather than having a new array allocated each time. This means that the growth is handled by the FastMM4 memory manager which will do it so much smarter. The trick is to realising that the size in memory is not just the number of records within the array but multiplied by the size of the record as is shown in the original code’s ZeroMem call. The new allocation code therefore takes a pointer to the existing array and reallocates the memory for it rather than recreating it as a new array.
The usage:
I have tested the resulting midas.dll in BDS 2006 and Delphi 2010. It is likely that you can use the DLL in all flavours of Delphi/C++ from 6 through to 2010 but for legal reasons you probably need to own a copy of Delphi/C++ 2010.
The result:
I have not profiled the changes in AQTime as I normally would but the speed difference is so blatant that i see no point. One day I might do a follow up post with actual performance test results.
I have tested on a Toshiba Tecra P5 laptop. One of Toshibas better ones but still much slower than a workstation or server. The basic specs are Core 2 Duo T9300 CPU, 3GB RAM, 5400RPM 2.5 HDD, Win XP. These tests where done using Midas.dll – maybe further slight improvement is to be had by embedding with MidasLib.
The test involved creating a TClientDataSet with a single ftMemo field and appending many records. A 778 byte string value was written into the memo field for each record. The dataset therefore requires just under 100MB RAM per 100,000 records.
Using original allocation method to append
100,000 records = <4 sec and 570k soft page faults
200,000 records = 12 secs and 2.4mil soft page faults
500,000 records = 1 min 56 secs and 15.3mil soft page faults
1,000,000 records = your dreaming – page thrashing bad at this point with hard faults too
Using new allocation method to append
100,000 records = <1 sec and 24k soft page faults
200,000 records = <2 secs and 47k soft page faults
500,000 records = <4 secs and 133k soft page faults
1,000,000 records = <9 secs and 252k soft page faults (745 MB RAM)
Notes:
1. By “original allocation” I mean a build of the original midas code provided with D2010 (without MidasSpeedFix as it is included in this version of Midas.dll)
2. All timing etc is estimate. The difference is so blatant I didn’t do any exact profiling
3. Obviously total memory requirement is not affected by the allocation method. to give an
idea on memory usage it is taking about 745MB for 1mil records with new and original method
4. Yes it is a dirty hack treating the array like this
5. Dont ask me why you would want this many records in a TClientDataSet. Sometimes poeple you just do.
Precompiled binary version
Lots of people have asked for a precompiled binary so they can test this themselves. I am in the process of reviewing the legal details as to whether i can put it on this site for download. If someone knows the answer already please let me know. Better yet can a real c++ programmer provide a Make script to compile this with the free Borland c++ Compiler 5.5 (available at embarcadero website)?