Logo Tamarack Associates
Products Download Support Order Contact

Full Text Search Product Comparisons

The following table compares Rubicon with DBISAM's full text search capability. Please be sure to read the explanations that follow the table as they elaborate on each point. A test program and database are available at the bottom of this page for download.

As the table and explanations illustrate, Rubicon emphasizes speed and customization in all aspects of indexing, updating, and searching. 

  Rubicon 2.20 DBISAM 3.26
Integration Add on components Built in
Portable Yes No
FTS index storage Separate table IDX file
FTS index location Any drive or directory In database
Index on separate PC Yes Yes
Indexing open table Yes No when restructuring
Access to indexed words Yes No
Index all field types Yes Only native fields
Custom field handling Yes No
Custom word handling Yes No
Partial indexing Yes No
Segmented indexing Yes No
Index 50mb speed 1 minute 7 seconds 9 minutes 12 seconds
Index 50mb size 52mb 29mb
Search logic and, or, not, near, phrase, like and, or, not
Ranking Yes, optional Yes, optional
Wildcard searching Yes Yes
Matching words & counts Yes No, just words
Limit match results Yes Yes
Match dataset optional Yes Yes, with filters
Search speed narrow 0.06 seconds 0.11 seconds
Search speed normal 1.2 seconds 2.5 seconds
Search speed vague 18 seconds (see notes) 36 seconds

Integration

Rubicon is a set of add on components that handle the full text search (FTS) functions of building, updating, and searching of word indexes. DBISAM offers a built in FTS feature that does not require any additional components.  DBISAM FTS is accessed through SQL function like TEXTSEARCH and TEXTOCCURS.

Portable

Rubicon FTS is portable across a wide range of SQL and table based databases including Advantage, ADO, Apollo, BDE, DBISAM, Direct Oracle Access, FlashFiler, InterBase Express, InterBase Objects, NexusDB, ODBC Express, and Topaz.

FTS Index Storage

Rubicon FTS indexes are stored in separate tables. DBISAM FTS indexes are stored in the table's idx file. Rubicon stores its indexes in a separate table, so you have the choice of backing up this file.

FTS Index Location

Rubicon indexes are not tied to the database of the table being indexed. The Rubicon indexes may be stored anywhere.

Index on Separate PC

Since indexing is resource intensive, it is often useful to perform this task on a separate PC. Rubicon can read the target data from a remote system and create the FTS indexes on the local system. DBISAM can index a table on another system, but the FTS indexes have to be written back to that system.

Indexing Open Table

Rubicon does not require exclusive access to the table when indexing so the table may be open for read access by other users. When adding FTS indexing to an existing DBISAM table, the table must be restructured which requires exclusive access.

Access to Indexed Words

The words indexed by Rubicon are visible in the "words" table created by Rubicon. This table includes the number of occurrences of the word in the table. The ability to view the words provides useful feedback when fine tuning the parsing and definition of words.

Index All Field Types

Rubicon can index all native fields types plus calculated and lookup fields.

Custom Field Handling

With Rubicon you can customize the handling of the text in the field. If the field contains RTF or HTML, the field can be converted to plain text before indexing. If there are certain parts of the text you do not wish to be indexed, you can prevent those parts from being indexed.

Custom Word Handling

Rubicon allows you to completely control which words are indexed. You can customize the parsing routine and inspect each word before it is indexed. For instance, the parsing routine may return '00000xyz' as a word, but by using the Rubicon TrbAccept component (or your own code) this word can be rejected for indexing. 

Partial Indexing

Partial indexing is useful when you are trying to refine your word parsing and definition routines. Instead of indexing the entire table, you can index the first 1000 or so records and inspect the results.

Segmented Indexing

Indexing very large tables (number of records, number of unique words, or both) is very resource intensive and may be impossible unless the task is broken down into a series of smaller tasks. Rubicon addresses this issue by using segmented indexing.  Segmentation allows Rubicon to divide the table being indexed into small pieces virtually (Rubicon never alters the target table) and each piece or segment is indexed individually.

Index 50mb Speed

Time to index a 50mb table of 15,000 newsgroup messages with 326,000 unique words. Rubicon extensively uses in memory caching and index compression to minimize disk access.

Index 50mb Size

Size of the indexes created. Rubicon stores its indexes in the same table type as the table being indexed, in this case a DBISAM table. The combined size of the DBISAM blb, dat, and idx files are 52mb. For DBISAM FTS, the indexes are stored in the idx file. After restructuring the DBISAM table, the idx file size grows by 29mb. 

The Rubicon index size could be significantly reduced by using a 64 byte BlobBlockSize and by taking advantage of Rubicon's custom word handling to reject many of the 326k words.

Search Logic

Operations supported by the FTS engine. The Rubicon NEAR operator has a default distance of eight words. The LIKE operator uses a modified soundex routine, but a custom routine may also be used.

Wildcard Searches

Rubicon supports single and multi character wildcard searches.

Matching Words & Counts

Matching words and counts are available via the Rubicon MatchingWords method. This information provides valuable feedback to the user, especially when using wildcard searches.

Ranking

Rubicon search results can be ranked by count, percent (record with highest count is 100%), and presence (word only counted once per record). Search performance can be improved when ranking is disabled.

Limit Match Results

It is not unusual for users to submit vague searches that return thousands of matches. In these instances it is not generally useful to return a complete result set, but rather return the first 100 matching records and allow the user to refine the search. With Rubicon you can perform the search, determine the number of matching records, and then decide whether to create a dataset of matching records and, if so, how many records to include. With DBISAM you can limit the query dataset size by using the MaxRowCount property, but then you cannot determine how many records actually matched the search criteria.

Match DataSet Optional

Rubicon does not require that a match dataset be created, which can be time consuming (especially with a large number of matching records). Instead the first page of match results can be created and displayed to the user by navigating through the indexed table using the Rubicon FindFirst and FindNext methods. These methods navigate in natural order or in rank order.

Search Speed Narrow/Normal/Vague

Time to complete ranked searches and create a dataset containing the matching records. The matching dataset sizes for narrow, normal, and vague searches are 14, 238, 5641 records, respectively. The search words were "century", "workstation" , and "windows".  

When performing non proximity searches with Rubicon, the matching record count is available  immediately (under 100ms on the slowest systems, less than 1ms on a 2.8ghz P4) regardless of the number of matching records. Thus the Rubicon search times are almost entirely comprised of ranking and the creation of the dataset containing the matching records. To manage the search time, your Rubicon application can execute the search, immediately determine the number of matching records, and then decide how to proceed on the more time consuming tasks of ranking and match dataset creation. 

Rubicon match results can be created in any supported dataset. During testing it was discovered that a TClientDataSet performed much better than an in memory TDBISAMTable. The Rubicon vague search was performed in 18 seconds with a TClientDataSet and 60 seconds using an in memory TDBISAM table. The test program available for download does not use a TClientDataSet.

Test Program & Database

Download the test program (executable and source code) and database files and unzip them in their own directory. Run the program, create the FTS indexes, and perform searches.

Test Results

Tests were performed under Windows XP, 2.8ghz P4 (hyperthreading enabled), 7200rpm ATA drive, 1gb memory.

 




Copyright 2006 © Tamarack Associates 
www.TamarackA.com www.FullTextSearch.com