Database comparison

Introduction

The database comparison will compare, in two databases, elements with the same structure. More precisely, it will run a given sql query in both databases and compare the query results.

Sql query files are stored in the ‘queries’ folder, referenced in the comparison file and run on both databases. The simplest use is to have sql queries that perform a Select all columns in a single table, but more complex queries can be used.

Configuration file

A minimal bacmp configuration file for database will be as below

The "type" is the database type. Supported values are:

DB2
MySql
Oracle
Postgres
SqlServer

For a given connection to work properly, the corresponding driver must be in the home folder:

The "leftDb" and "rightDb" represent the connection to a database.
"dbName" or "serviceName", "hostName", "port" correspond to the database url. For instance:

Oracle:
Caution, dbName and serviceName cannot be used at the same time. Only one of those field must be set in your .bacmp file.
- jdbc:oracle:thin:@hostName:port:dbName
- jdbc:oracle:thin:@hostName:port/serviceName
Postgresql: jdbc:postgresql://hostName:port/dbName
…

"user" and "pwd" are the connection credentials. The "schema" to use depends on the database. In most database systems, it represents the database subdivision where the tables are stored. For instance, in Postgresql, it is most often “public”, in Oracle, it is usually the same as the user name.

Custom Connection String

An alternative way to describe your database in the bacmp file can be to use the optional "customDbConnectionUrl" field. With this field you can directly provide the connection string for the database. In this case you can remove the "dbName", "hostName" and "port" fields. If the user and password are stored in your connection string, you have to remove those fields from the database configuration.
Below is an example of a simple bacmp file using two way to connect to the same database with "customDbConnectionUrl" :

{
  ...
    "leftDb": {
      "customDbConnectionUrl": "jdbc:oracle:thin:MyUser/PASSW0RD@hostname:1234:mydatabase",
      "schema": "sys",
    },
    "rightDb": {
      "customDbConnectionUrl": "jdbc:oracle:thin:@hostname:1234:mydatabase",
      "pwd": "PASSW0RD",
      "schema": "sys",
      "user": "MyUser"
    },
  ...
}

AWS Secrets Manager

It's also possible to compare databases through AWS Secrets Manager. In your .bacmp file, give the "secretName" of your database and nothing else. Then paste your credentials of the account that have access to this AWS Secrets Manager database in your environment and launch the Compare Tool.

"rightDb": {
   "dataBaseAWSSecretName": "secret/mysecret"
},

Note that you can add any of the usual parameter concerning the database, like "customDbConnectionUrl" or "serviceName" as a secret value in AWS Secrets Manager depending on how you want to connect to your database.

Elements to compare

"selectedElements“: ["...", "...“] specifies the elements that must be compared.
This corresponds to query files residing in the ‘queries’ folder. For instance, if you have this ‘queries’ folder:

And you have this configuration:
"selectedElements": ["mytable1", "mytable2", "mytable4"]
This means you will compare mytable1, mytable2 and mytable4, but not mytable3. The specified queries are run on both the “left” database and the “right” database, and the results are compared.

Query files

The Sql query files must be SELECT queries returning the data to compare.
They will often be simple SELECT on a table, but might be more complex. See Complex queries.

There is no requirement that the sql file name corresponds to the table name.
For instance, you may have a query that selects data with a restricting where clause:
SELECT … FROM MYTABLE WHERE category = 'USR'
In this case, you may want to call the sql file ‘mytable1-usr.sql’ and use this name in the "selectedElements" tag.

Business keys

In addition to the SQL query, the sql file may contain a business key definition with the syntax -- Business_key: <column-name>; after the query. For instance:

SELECT id, firstname, name FROM PERSON;
-- Business_key: id;

The business key definition allows to enforce a comparison between two identified rows.

With ‘id’ as a business key, there may be two cases:

The comparison result will show that the row with business key 1 was modified:

The comparison result will show that the row with business key 1 appears only in left and the row with business key 2 appears only in right:

On the other hand, if the business key is ‘name’ instead of ‘id’, then both cases will show a modified row for business key ‘Kent’.

Notes:

The Business Key definition may contain several comma-separated columns, and/or correspond to complex constructs in the query (see Complex queries)
The Business Key definition should be used alongside an ORDER BY clause with the same column(s). This way, the rows are already in the correct order, the compare tool has less trouble reconciling the data and the overall performance is better.
The Business key definition is added as a comment after the query, so you can run the script in your favorite SQL tool without a need to amend it.

Presql

In some situations, the user may want to cleanup the database, or perform any other preliminary sql task, before launching the comparison. In this case, scripts can be put in a ‘presql’ folder beside the ‘queries’ folder.

If a script is found in the folder, it will be run at the beginning of the comparison.

Additional properties

In addition to the mandatory properties shown in Configuration file, there are additional properties. Most have default values that are used if the property does not appear in the file, but can also be specified to another value. Others are not used if left unspecified.

Default values are written in green below when they exist, a red ‘no value’ otherwise Please note that the double backslashes are only the json syntax for a backslash.

Specific to the database comparison

`preSqlFolder`	Customize the folder containing the presql scripts	See Presql
`queriesFolder`	Customize the folder containing the query files	See Elements to compare

Common properties

`comparisonDataStorage`	InRam (default) / OnDisk	See Disk mode
`csvSeparator`	Internal separator used for the comparison algorithm. Use a character that is not likely to appear in the data.
`differencesFolder`	Intermediate folder (OnDisk mode)	See Disk mode
`leftCsvFolder`	Intermediate folder (OnDisk mode)	See Disk mode
`reportFolder`	Folder containing the comparison HTML report
`rightCsvFolder`	Intermediate folder (OnDisk mode)	See Disk mode
`keepDifferencesFiles`	Intermediate folder cleanup (OnDisk mode)	See Disk mode
`keepExtractedCsvs`	Intermediate folder cleanup (OnDisk mode)	See Disk mode
`logLevel`	ERROR / WARN / INFO (default) / DEBUG The log is displayed in the console and also in a file beside the CompareTool.exe
`maximumDifferences`	Discard extra errors if too many error lines	See Html options
`reportSideBySide`	Another display mode for the HTML report	See Report side by side
`reportSplitLine`	Generate sub-reports if too many error lines	See Html options
`reportTitle`	Customize the report title	See The index page
`splitLines`	Split data before comparison	See Comparison performance
`splitSize`	Split data before comparison	See Comparison performance
`timeStampedReportFolder`	Append a timestamp to the report folder, to keep history when doing several runs
`twoPassDifferences`	Reconcile data across splits	See Comparison performance
`ignoreBlankLines`	Blank lines in files are ignored. Only useful in other comparison types, not database.
`spacesMode`	NOTHING: does nothing TRIM_BLANK: trim the field values if they are blank TRIM_START: removes the initial spaces on each line TRIM_END: removes the final spaces on each line TRIM: removes the initial and final spaces on each line REMOVE: removes all spaces in lines, even in the middle of the line
`caseInsensitive`	Comparison ignores field case

Complex queries

As stated in Query files, a query file is only a SELECT query, where you can put anything you want. It will often be a simple “select all columns from a table”, but more can be done:

Join several tables
Restrict rows
Remove or replace columns (this enables the same options as in Additional properties for flat files)
Concatenate columns
Convert some values
Etc.

Anyway, keep in mind that the tool will compare the columns returned by the query, which may not be actual columns in a table. The aliases in the select clause (implied or explicit) will be used to compare data and also to supply the headers in the report table.
The same way, the business key “columns” explained in Business keys correspond to such aliases and may in fact be more complex constructs.
Though very awkward, the following query file is valid:

SELECT (column1 || column2) as id1, column3, UPPER(column4) as upper4 FROM mytable; -- Business_key: id1,column3;

But keep in mind that the overall performance of your comparison is affected by the query complexity.