Database comparison

Introduction

The database comparison will compare, in two databases, elements with the same structure. More precisely, it will run a given sql query in both databases and compare the query results.   

Sql query files are stored in the ‘queries’ folder, referenced in the comparison file and run on both databases. The simplest use is to have sql queries that perform a Select all columns in a single table, but more complex queries can be used. 

Configuration file

A minimal bacmp configuration file for database will be as below

image.png

The "type" is the database type. Supported values are: 

  • DB2
  • MySql
  • Oracle
  • Postgres
  • SqlServer

For a given connection to work properly, the corresponding driver must be in the home folder:

image.png

The "leftDb" and "rightDb" represent the connection to a database.    
"dbName" or "serviceName", "hostName", "port" correspond to the database url. For instance: 

  • Oracle: 
    Caution, dbName and serviceName cannot be used at the same time. Only one of those field must be set in your .bacmp file. 
    • jdbc:oracle:thin:@hostName:port:dbName
    • jdbc:oracle:thin:@hostName:port/serviceName
  • Postgresql: jdbc:postgresql://hostName:port/dbName

"user" and "pwd" are the connection credentials. The "schema" to use depends on the database. In most database systems, it represents the database subdivision where the tables are stored. For instance, in Postgresql, it is most often “public”, in Oracle, it is usually the same as the user name.

Custom Connection String

An alternative way to describe your database in the bacmp file can be to use the optional "customDbConnectionUrl" field. With this field you can directly provide the connection string for the database. In this case you can remove the "dbName", "hostName" and "port" fields. If the user and password are stored in your connection string, you have to remove those fields from the database configuration.
Below is an example of a simple bacmp file using two way to connect to the same database with "customDbConnectionUrl" :

{
  ...
    "leftDb": {
      "customDbConnectionUrl": "jdbc:oracle:thin:MyUser/PASSW0RD@hostname:1234:mydatabase",
      "schema": "sys",
    },
    "rightDb": {
      "customDbConnectionUrl": "jdbc:oracle:thin:@hostname:1234:mydatabase",
      "pwd": "PASSW0RD",
      "schema": "sys",
      "user": "MyUser"
    },
  ...
}

AWS Secrets Manager

It's also possible to compare databases through AWS Secrets Manager. In your .bacmp file, give the "secretName" of your database and nothing else. Then paste your credentials of the account that have access to this AWS Secrets Manager database in your environment and launch the Compare Tool.

"rightDb": {
   "dataBaseAWSSecretName": "secret/mysecret"
},

Note that you can add any of the usual parameter concerning the database, like "customDbConnectionUrl" or "serviceName" as a secret value in AWS Secrets Manager depending on how you want to connect to your database.

Elements to compare

"selectedElements“: ["...", "...“] specifies the elements that must be compared.    
This corresponds to query files residing in the ‘queries’ folder. For instance, if you have this ‘queries’ folder:

image.png

And you have this configuration:    
"selectedElements": ["mytable1", "mytable2", "mytable4"   
This means you will compare mytable1, mytable2 and mytable4, but not mytable3. The specified queries are run on both the “left” database and the “right” database, and the results are compared. 

Query files

The Sql query files must be SELECT queries returning the data to compare.    
They will often be simple SELECT on a table, but might be more complex. See Complex queries.    

There is no requirement that the sql file name corresponds to the table name.    
For instance, you may have a query that selects data with a restricting where clause:    
SELECT … FROM MYTABLE WHERE category = 'USR'    
In this case, you may want to call the sql file ‘mytable1-usr.sql’ and use this name in the "selectedElements" tag. 

Business keys

In addition to the SQL query, the sql file may contain a business key definition with the syntax -- Business_key: <column-name>; after the query. For instance:    

SELECT id, firstname, name FROM PERSON;    
-- Business_key: id;    

The business key definition allows to enforce a comparison between two identified rows.    

With ‘id’ as a business key, there may be two cases:

  • The comparison result will show that the row with business key 1 was modified:
image.png
  • The comparison result will show that the row with business key 1 appears only in left and the row with business key 2 appears only in right:
image.png

On the other hand, if the business key is ‘name’ instead of ‘id’, then both cases will show a modified row for business key ‘Kent’.   

Notes:

  1. The Business Key definition may contain several comma-separated columns, and/or correspond to complex constructs in the query (see Complex queries)
  2. The Business Key definition should be used alongside an ORDER BY clause with the same column(s). This way, the rows are already in the correct order, the compare tool has less trouble reconciling the data and the overall performance is better.
  3. The Business key definition is added as a comment after the query, so you can run the script in your favorite SQL tool without a need to amend it.

Presql

In some situations, the user may want to cleanup the database, or perform any other preliminary sql task, before launching the comparison. In this case, scripts can be put in a ‘presql’ folder beside the ‘queries’ folder. 

image.png

If a script is found in the folder, it will be run at the beginning of the comparison. 

Additional properties

In addition to the mandatory properties shown in Configuration file, there are additional properties. Most have default values that are used if the property does not appear in the file, but can also be specified to another value. Others are not used if left unspecified.    

Default values are written in green below when they exist, a red ‘no value’ otherwise Please note that the double backslashes are only the json syntax for a backslash. 

DatabaseComparison-AdditionalProperties.PNG

Specific to the database comparison 

preSqlFolderCustomize the folder containing the presql scriptsSee Presql
queriesFolderCustomize the folder containing the query filesSee Elements to compare

Common properties

comparisonDataStorageInRam (default) / OnDiskSee Disk mode
csvSeparatorInternal separator used for the comparison algorithm. Use a character that is not likely to appear in the data. 
differencesFolderIntermediate folder (OnDisk mode)See Disk mode
leftCsvFolderIntermediate folder (OnDisk mode)See Disk mode
reportFolderFolder containing the comparison HTML report 
rightCsvFolderIntermediate folder (OnDisk mode)See Disk mode
keepDifferencesFilesIntermediate folder cleanup (OnDisk mode)See Disk mode
keepExtractedCsvsIntermediate folder cleanup (OnDisk mode)See Disk mode
logLevelERROR / WARN / INFO (default) / DEBUG    
The log is displayed in the console and also in a file beside the CompareTool.exe
 
maximumDifferencesDiscard extra errors if too many error linesSee Html options
reportSideBySideAnother display mode for the HTML reportSee Report side by side
reportSplitLineGenerate sub-reports if too many error linesSee Html options
reportTitleCustomize the report titleSee The index page
splitLinesSplit data before comparisonSee Comparison performance
splitSizeSplit data before comparisonSee Comparison performance
timeStampedReportFolderAppend a timestamp to the report folder, to keep history when doing several runs 
twoPassDifferencesReconcile data across splitsSee Comparison performance
ignoreBlankLinesBlank lines in files are ignored. Only useful in other comparison types, not database. 
spacesModeNOTHING: does nothing    
TRIM_BLANK: trim the field values if they are blank    
TRIM_START: removes the initial spaces on each line    
TRIM_END: removes the final spaces on each line    
TRIM: removes the initial and final spaces on each line    
REMOVE: removes all spaces in lines, even in the middle of the line
 
caseInsensitiveComparison ignores field case 

Complex queries

As stated in Query files, a query file is only a SELECT query, where you can put anything you want. It will often be a simple “select all columns from a table”, but more can be done: 

  • Join several tables
  • Restrict rows
  • Remove or replace columns (this enables the same options as in Additional properties for flat files)
  • Concatenate columns
  • Convert some values
  • Etc.


Anyway, keep in mind that the tool will compare the columns returned by the query, which may not be actual columns in a table. The aliases in the select clause (implied or explicit) will be used to compare data and also to supply the headers in the report table.    
The same way, the business key “columns” explained in Business keys correspond to such aliases and may in fact be more complex constructs.    
Though very awkward, the following query file is valid:    

SELECT (column1 || column2) as id1, column3, UPPER(column4) as upper4 FROM mytable; -- Business_key: id1,column3;    

But keep in mind that the overall performance of your comparison is affected by the query complexity.