PdxHak, PdxHakRecover and PdxHakViewer

This set of tools makes it possible to read corrupt (and even incomplete) Paradox tables without using the BDE distributed by Borland. The work is entirely based on pure reverse-engineering of the file format, so it does not require third-party libraries nor tools.

Simply put, what you find here is the most reliable way to recover corrupt Paradox tables, and is offered free of charge with full Delphi source code (a single file of about 3000 lines, roughly 100Kb); a compiled command-line utility for Windows is also included in the package.

Actually, this code may also be used to read healthy Paradox tables under Linux, where no BDE is available.

License & distribution

This work is released under the GNU GPL license.

Download

The package, which includes the full source code and a Windows binary is freely available on sourceforge, though you may still be able to find older versions at my former company's website.

The project is also listed on Borland's site as Borland CodeCentral ID 19244.

Documentation

There is no separate documentation, but the rest of the page you are reading now contains useful information extracted from the source code, and should be enough to get you started.

In addition, the entire source code is heavily commented.

Feedback

As of year 2004, I am no longer working on this code, but if you need help you can still reach me at:

a.biardi@tiscali.it

I may not be prompt in replying, so please be patient.

Features

With the code in this unit, you can:

Open a table which is no longer accessible via the BDE, and read in field definitions; this applies to versions 3.0, 3.5, 4.0, 5.0 for Windows, and 7.0;
Create a new BDE table, borrowing the field structure from the corrupt one (including retaining current values for ftAutoInc fields);
Navigate the table and read field values;
Access records which can't be reached through the BDE (such as deleted records; see Implementation Notes, below).

You can not:

Access a password-protected (encrypted) table; I know that the encryption method used by Paradox is not so secure and can be easily bypassed, so maybe support for encrypted tables may be added in future versions;
Alter in any way a (corrupt) table (this unit only provides readings, not writings; would you ever trust a repaired table?);
Get extended information, such as validity check constraints, index data, referential integrity, and so on.

Compatibility

This code was developed, compiled and tested on MS Windows (Delphi 6.0), but contains some logic to compile on Linux too (Kylix 3.0). Since DBExpress/BDE support is optional, you can actually use this code with Kylix 3.0 Open Edition for Linux to gain (read-only) access to Paradox tables written by Windows programs.

How the story began

Tired of corrupt Paradox tables, I started working on a BDE-independent code that would allow recovering data from such files.

Thanks to Borland for not releasing any docs about Paradox file formats, it was rather difficult to reverse-engineer the table structures.

I've tried to keep things simple, and I've concentrated on core routines, rather than building an entire, GUI-based, application. If you are interested in creating one, feel free to do it (see License & distribution, above).

The Paradox format

I'm not going to give a full explanation of how Paradox stores table data; anyway, have a look at the section "Useful Links" below, or try contacting me if you have any questions.

Quickstart

This comment will guide you to understand how TPdxHak works, and by the end you will be presented a short but complete code that fully implements reading data from a corrupt table, and copying each record and each field to a newly-created TTable; you will be surprised of how much effort has been made in order to keep the whole process simple.

If you are wondering, the 'PdxHak' prefix stands for Paradox Hacker; and, yes, I'm not that good in choosing file names, but by the time this work was finished, I just didn't want to rename anything..

For the example we are going to see, we will refer to a generic section of code, which may well be a event procedure (such as a TForm1.Button1Click); take this code as an extract and as a starting point, and build your own code to support and run it; a little of object-oriented programming in Delphi knowledge is required here, anyway.

Now, forget a while about retrieving data, and let's talk about navigation.

Traditional DB datasets can be scrolled by using First, Last, Prior, Next and other methods and a few of them (including TTables accessing Paradox files) by setting the RecNo property, thus completely hiding the concept of database pages (or blocks).

In order to keep the implementation of TPdxHak simple, I will refer to an active page (see the Page property) which ranges from 0 to PageCount-1; with TPdxHak you will only be able to access one database page at a time, but you will be free to access as many records as you like in that page (opposed to the traditional TDataSet interface, where you only have one active record at any given time).

As an example, this code will loop for the entire content of a table:

for i:= 0 to h.PageCount-1 do
begin

  h.Page:= i;

  for j:= 0 to h.RecordCount-1 do
  begin

    // ..do whatever you like with record <j> in page <i>

  end;

end;

Please note that RecordCount refers to the number of records in the active page, and NOT in the whole table as it would be with the BDE; also, MaxRecordCount refers to a page, but - in this case - the actual active page number is not relevant (this property tells you the maximum number of records that could possibly be stored in a table page).

You should have noticed, by now, that not all pages will have the same number of records; Paradox, in fact, (and other database engines, of course) uses a paged format to increase performance and reliability.

Sometimes you will find that a big table, which should hold thousands of records, only shows a few (or, worst, seems empty). In this case, you will need to enable the phoRetrieveDeleted option that allows deleted (and inexistent) records to be seen; this will simply bypass the calculation of how many records are in the active page (see TPdxHak.GetRecordCount), and will tell TPdxHak to retrieve up to the maximum number of records per logical page (see GetMaxRecordCount); be aware that this option will also retrieve data from not-yet-allocated records, which usually just contain null (empty) fields.

The active page can be unloaded (discarded from memory) by setting its value to a negative number (I would use -1); this will also unload any allocated blob buffers. Almost all data gets cached, and accessing the same data twice should not reload it from disk.

Field definitions (the list of fields in the table structure) can be accessed through the FieldDefs property; it should not be very different from accessing FieldDefs in a standard BDE TTable component. See below for an example.

Now, about record data retrieval. You can access field values with the Fields array property (which ranges from 0 to FieldDefs.Count-1), of a given record, as well as with the FieldByName method, much like as you would do with a traditional TDataSet.

Conversion routines have been provided so that you can almost always access a field through its AsString property, and numbers, booleans, dates and even text blobs all get properly translated.

However, a few extra routines have been designed to facilitate transferring data to a standard TField component; say, for example, you would like to populate a field content regardless of its type; then consider something like this:

MyNewTable.Fields[n].Assign(MyPdxHak.Fields[m]);

which does all of the work (if any is really needed) internally. This will also work for common field-based assignments, with TStrings, TBitmap and TPicture destinations:

MyMemo.Lines.Assign(MyPdxHak.FieldByName('Notes'));
MyPicture.Assign(MyPdxHak.FieldByName('Image'));

..it's cool, isn't it?

As a general rule, exceptions are raised for any invalid operations; error messages come with possible suggestions (see the resource string table).

Blob streams are also supported; if you ever used a TBlobStream object to read data from a regular blob field (such as a TMemoField from a standard TTable), you will find this support useful and easy to use:

Stream:= TPdxHakBlobStream.Create(MyPdxHak.Records[0].Fields[2]);
...

Position, Seek, Read, Size and other properties and methods are fully supported, but keep in mind that this stream is read-only.

A couple of options alter the way blob data is processed (see Implementation notes, below).

Finally, as promised, I'm presenting a complete code portion which does all you need to create a new TTable file and populate it by reading the corrupt table with TPdxHak; this code is also included in the companion procedure DoTheWholeStuff.

h:= TPdxHak.Create('c:\temp\badtable.db');
try

  with TTable.Create(nil) do
  try

    TableName:= 'c:\temp\rebuilt';
    FieldDefs.Assign(h.FieldDefs);
    CreateTable;
    Open;

    for i:= 0 to h.PageCount-1 do
    begin

      h.Page:= i;

      for j:= 0 to h.RecordCount-1 do
      begin

        Append;
        try

          for k:= 0 to FieldCount-1 do
            Fields[k].Assign(h.Records[j].Fields[k]);

          Post;

        except
          Cancel;
          raise;
        end;

      end;

    end;

  finally
    Free;
  end;

finally
  FreeAndNil(h);
end;

That's it. Hope you like the simplicity of this code.

Implementation notes

Whenever you set the Page property to a different value, you are loading the entire page in memory (which can be 1 to 32Kb); blobs get loaded only if accessed (it's safe to test IsNull, as well as BlobSize, which will not try to load any additional data). Once the page number has been set, you can read records and fields from that page.

Turn off assertions ($C- or $ASSERTIONS OFF directives) to improve performance; anyway, consider that when designing this code, every effort was made in order to READ tables, not to read them FAST. Especially when reading tables with blob fields, you might find this unit being quite slow.

Strong error checking has been enforced in order to get the very details of what's going on (see, for example, the - maybe excessive - error control on seek and read statements); since you will probably use this unit to access a corrupt table, you might need to bypass some assertions and validity checks that prevent TPdxHak to read your data. In many cases, however, lowering the level of error checking may cause errors and other strange behaviours.

One of the most interesting capabilities of this unit is the possibility to change field definitions on the fly, which is especially useful if you cannot recover a table's header; changing field definitions is allowed without any special precautions, even when the table has already been opened.

However, consider reloading the current active page in order for changes to be seen by your application (and, possibly, by the user). Be aware that supplying bad (manual) parameters for a table could lead to several kind of errors, including assertion failures, GP faults and other beasts; while this is code is robust enough to handle such situations, your nice, GUI-based, application might not.

Field definitions (see object TPdxHakFieldDefs) can include nameless fields (and non-unique names), if you don't really care of field names, and access them mainly by field index.

This code will also read (old) Paradox 3.0 tables, which cannot be created using the BDE. I do not actually know if the BDE can read them at all; just to clear any doubts, consider the following table:

TblVersion	TableLevel	Description
htv30	-	Paradox 3.0
htv35	3	Paradox 3.5
htv40	4	Paradox 4.0
htv50	5	Paradox 5.0 for Windows
htv70	7	Paradox 7.0

The only file needed to recover data from a Paradox table is the .db file itself. Blob data will of course require the additional .mb file to be in the same directory.

If you do not want blob support, then turn off phoRetrieveBlob and phoPartialBlob options; all blob fields will appear to be null, AsBlob property will always return a nil pointer, and Blobsize will always return 0.

Partial blob data retrieval can be achieved with the phoPartialBlob option; this will force TPdxHak not to access the .mb file, and only retrieve blobs (if any) from the embedded data in the .db file; this might not be what you are looking for, and while it is safe to truncate text blobs, other blob field types (such as ftGraphic or ftParadoxOle) would probably lead to unuseful data.

The default option is to have full blob support (phoRetriveBlob option); if this is not possible (for example, the .mb file does not exist), TPdxHak will resort to behave as if the phoPartialBlob option had been set.

Please note that while ftBytes (type Y) is not a blob field (it is entirely stored in .db files, and has a fixed size), it can be accessed only through AsBlob; see also GetBlobSize, which can determine if such a field is null -- Honestly, I really thought a field of this type could never be 'empty'. However this reflects how the BDE behaves..

For ftAutoInc fields, they're actually treated as standard ftInteger ones (in fact, there's no difference, strictly speaking); when copying FieldDefs from TPdxHak to a TTable, this unit takes care of hiding the special feature of those (read-only) fields, so they are translated into ftInteger equivalents. If you need to have them back in the original format (ie, with auto-increment capability), have a look at the companion procedure IntegerToAutoInc, which should self-explain.

About supplying manual parameters, and setting options.

Sometimes, you will find useful to manually set a number of parameters, which are normally loaded from the .db file, such as RecordSize, HeaderSize, PageSize, TblVersion and FieldDefs. These values are read when the first access to the .db file stream is made, and only if their value has not already been set yet; of course, you could also set these properties anywhere your code needs to.

Furthermore, the TblVersion value is used when first accessing the FieldDefs property (either explicitly or implicitly); changing the value of TblVersion property afterwards has no effects. In this case, set it just after creating the TPdxHak object, or supplying manual field definitions (ie, populate the FieldDefs object by manually adding fields).

Options phoRetrieveHeader, phoRetrieveFieldDefs and phoRetrieveFieldNames are only needed once, and if changed after the corresponding function has been completed produces no effects (for example, turning off the phoRetrieveHeader options when the header has already been loaded does not make much sense).

Other options (phoRetrieveDeleted, phoIgnoreFileSize, phoRetrieveBlob, and phoPartialBlob) can be safely turned on and off as necessary. In this case you should consider reloading the current active page or re-read affected properties (such as PageCount, RecordCount, and so on).

BDE dependency

While this code still uses the DBTables unit (and, thus, the BDE), this is of course not needed to access Paradox files (if you are curious, the only portion of code which requires the DBTables unit is in TPdxHakField.SaveToField).

If you don't want BDE support, undefine HAVE_DBTABLES_UNIT in the options.inc file: this should also help compiling on Linux (also undefine HAVE_DB_UNIT if you are using Kylix 3 Open Edition).

Known bugs & limitations

Due to bugs and differences in the BDE / Delphi timestamp codings, date values before 01/01/0001 (that is, year 1 AD) are not reliably decoded. Also consider that the BDE will allow you to store negative year numbers, and database applications such as Database Dekstop will display years regardless their sign (!!), while EncodeDate and DecodeDate will not work correctly (they will display 00/00/0000 or other strange values; if you are curious, you will also find that Delphi inserts a 'phantom' day between 31 Dec 0000 and 01 Jan 0001).

BDE does not correctly load values in BCD fields with high precision (tested with 32 digit precision, but may also happen with a lower number); yet, this code does! Also, BDE admits BCD fields with precision 0, while Database Desktop does not (try restructuring a table with a #0 type field).

Finally, bugs in the BDE implementation do not allow this code to create certain types of blob fields correctly, but this is a minor issue. See TPdxHakFieldDefs.SaveToFieldDefs for details.

Credits

Part of this work is based on (old) documentations from Kevin Mitchell (CIS 70717,475) and Randy Beck (bex@compuserve.com, http://ourworld.compuserve.com/homepages/bex), though the entire code has been designed and written from scratch.

But at the time of this writing, I was not able to contact either of the two.

Contributing

If you like this software, you can support it by donating some money. If you do, 10% of the total amount of your donation will automatically go to the Open Source Initiative (OSI), a non-profit organization that promotes the Open Source Definition for the good of the community (www.opensource.org). Thanks.