Skip to content

Configuring a Reconciliation Job

This is the complete reference for configuring a reconciliation job. It covers every option you can set, what it does, and how the options interact. For the behaviour behind these options — how types are inferred, how filters compare, how tolerances are evaluated — see Data Types, Filtering & Comparison.

A job ties together two data sources and tells DataRecs what to compare and how strict to be. The top-level fields are:

FieldRequiredPurpose
nameYesHuman-readable job name. Appears in notifications and run history.
workspace_idYesThe workspace the job belongs to.
dimensionsYesColumns to group by (the GROUP BY keys).
measuresYesColumn + aggregation pairs to compare. At least one required.
connectionsYesThe two data sources, with their source location, column mapping, and filters.
stagesYesOne or more comparison stages, each with its own dimensions and tolerances.
stage_execution_strategyNoHow multiple stages run. Defaults to DRILL_DOWN_ON_FAILURE.
max_stage_concurrencyNoCaps parallelism when the strategy is PARALLEL.
notification_emailsNoAddresses that receive the result email.
email_detail_levelNoHow much detail the result email includes.
max_email_rowsNoCaps the number of result rows included in detailed emails.
{
"name": "daily-revenue-by-region",
"workspace_id": "4c60b399-37c6-4e3b-9337-7ab4d7692790",
"dimensions": ["Region"],
"measures": [
{ "name": "Amount", "aggregation": "SUM" }
],
"connections": [ /* see Connections */ ],
"stages": [
{
"dimensions": ["Region"],
"tolerances": [
{ "measure_name": "Amount", "type": "ABSOLUTE", "value": 0.01 }
]
}
],
"stage_execution_strategy": "DRILL_DOWN_ON_FAILURE",
"notification_emails": ["data-team@example.com"],
"email_detail_level": "FULL_DATA"
}

dimensions are the columns DataRecs groups by before comparing — the equivalent of SQL GROUP BY. Measures are computed within each group, and groups are matched across the two sources by their dimension values.

"dimensions": ["Region", "Category"]
  • Use the canonical column names here (the keys you define in each connection’s column_mapping), not the raw source column names.
  • Rows with an empty dimension value are grouped together under a single “empty” group, consistently on both sources.
  • An empty dimensions array compares a single grand-total group (every row folded together).

measures define what to compare. Each measure pairs a name (a canonical column, or any label for COUNT_STAR) with an aggregation.

"measures": [
{ "name": "Amount", "aggregation": "SUM" },
{ "name": "OrderId", "aggregation": "COUNT_DISTINCT" },
{ "name": "rowcount", "aggregation": "COUNT_STAR" }
]
AggregationComputesColumn requirement
SUMTotal of the valuesNumeric
AVGMean of the valuesNumeric
MINSmallest valueNumeric, text, or date
MAXLargest valueNumeric, text, or date
COUNTCount of non-empty valuesAny
COUNT_DISTINCTCount of distinct valuesAny
COUNT_STARCount of rows (no column needed)
VARIANCEStatistical varianceNumeric
STD_DEVStandard deviationNumeric

connections is an array of exactly two data sources. Each entry locates the data and maps its columns onto your canonical names.

"connections": [
{
"connection_id": "4e809504-9cfa-4c94-b66d-2ce9bd53ea08",
"source_database": "Quarterly Sheet (1AbC...xyz)",
"source_schema": "Q1_Sales",
"source_table": "Q1_Sales",
"column_mapping": { "Region": "region_name", "Amount": "net_amount" },
"filters": [],
"alias": "Google Sheet"
},
{
"connection_id": "75479276-1431-4fac-a179-06ccd8fbafc6",
"source_database": "Finance Workbook (01BPDZ...4E7UXF)",
"source_schema": "Q1_Sales",
"source_table": "Q1_Sales",
"column_mapping": { "Region": "Region", "Amount": "Amount" },
"filters": [],
"alias": "Excel/SharePoint"
}
]
FieldRequiredPurpose
connection_idYesThe configured connection to extract through.
source_schemaYesThe schema (database sources) or tab/worksheet (spreadsheets) to read.
source_tableYesThe table, or the named table/range within a spreadsheet tab.
source_databaseSometimesThe database, spreadsheet, or workbook identifier. Required for spreadsheet sources (see below).
column_mappingYesMaps your canonical column names to the actual column names in this source.
filtersNoPer-source filter conditions (see Filters).
aliasNoA display label for this source in results and emails.

column_mapping is how DataRecs reconciles sources whose column names differ. The keys are your canonical names (used everywhere else in the job — dimensions, measures, filters); the values are the real column names in that specific source.

"column_mapping": { "Region": "region_name", "Amount": "net_amount" }

If both sources use the same column names, the mapping is an identity map ({ "Region": "Region" }). The mapping must include every dimension and every measure name; it may additionally include any column you filter on.

For database sources, source_database is optional. For spreadsheet sources it is required and encodes the spreadsheet/workbook identifier:

  • Google Sheets — the spreadsheet ID, conventionally written as a display label with the ID in trailing parentheses: "My Sheet (1AbC...xyz)". DataRecs reads the ID from the parentheses.
  • Excel/SharePoint — the workbook’s drive-item ID in trailing parentheses: "My Workbook (01BPDZ...4E7UXF)". The SharePoint site is taken from the connection configuration.

filters restrict which rows each source contributes. They are applied per source, before aggregation, and multiple filters are combined with AND. Apply the same filters to both connections so each side compares the same logical subset.

"filters": [
{ "column": "Status", "operator": "=", "values": ["Active"] },
{ "column": "Amount", "operator": ">", "values": ["100"] }
]
FieldRequiredPurpose
columnYesThe canonical column to filter on.
operatorYesOne of the operators below.
valuesSometimesThe comparison value(s). Omitted for IS NULL / IS NOT NULL.
OperatorvaluesExample
=one{ "column": "Status", "operator": "=", "values": ["Active"] }
!=one{ "column": "Status", "operator": "!=", "values": ["Excluded"] }
> < >= <=one{ "column": "Amount", "operator": ">=", "values": ["100"] }
INone or more{ "column": "Region", "operator": "IN", "values": ["US", "UK"] }
NOT INone or more{ "column": "Region", "operator": "NOT IN", "values": ["EU", "APAC"] }
LIKEone{ "column": "Name", "operator": "LIKE", "values": ["A%"] }
IS NULLnone{ "column": "Notes", "operator": "IS NULL" }
IS NOT NULLnone{ "column": "Notes", "operator": "IS NOT NULL" }

Behaviour highlights (full detail in Data Types, Filtering & Comparison):

  • Range operators (>, <, >=, <=) compare numerically on numeric columns and lexicographically on text columns. >= and <= include the boundary value.
  • IS NULL / IS NOT NULL treat an empty string "" the same as NULL.
  • A filter that excludes every row yields a MATCHED result with zero rows — not an error.
  • values may contain {{parameter}} placeholders resolved at run time.

stages is an array of one or more comparison passes. Each stage has its own dimensions (grouping) and tolerances (per-measure strictness). A single-stage job is the common case; multiple stages enable drill-down.

"stages": [
{
"dimensions": ["Region"],
"tolerances": [
{ "measure_name": "Amount", "type": "ABSOLUTE", "value": 0.01 },
{ "measure_name": "Revenue", "type": "PERCENT", "value": 5 }
]
}
]

A tolerance says how far a measure may differ between sources before the group is a mismatch.

FieldPurpose
measure_nameThe measure this tolerance applies to. Must match a defined measure.
typeABSOLUTE or PERCENT.
valueThe threshold.
  • ABSOLUTE — the raw difference may be at most value. A group fails when delta > value.
  • PERCENT — the percentage difference (relative to the smaller source value) may be at most value. A group fails when percentage_difference > value.
  • No tolerance, or value: 0 — any non-zero difference is a mismatch (exact match required).

With multiple measures, a group is MATCHED only if every measure passes. One breach fails the group. A zero-versus-non-zero difference under a PERCENT tolerance always fails (its percentage difference is treated as 100%).

stage_execution_strategy controls how multiple stages run. For a single-stage job all strategies behave identically.

StrategyBehaviour
DRILL_DOWN_ON_FAILURE (default)Stages run from coarsest to finest. A stage only runs for the groups that failed the previous stage. If a coarse stage matches, the finer stages are skipped. Efficient for large reconciliations.
SEQUENTIALEvery stage runs in order, independently, regardless of pass/fail. No drill-down filtering.
PARALLELAll stages run concurrently and independently. Use max_stage_concurrency to cap how many run at once.

When using DRILL_DOWN_ON_FAILURE, each stage must include all dimensions of the previous stage (it can only get finer, never change keys). For example:

"stages": [
{ "dimensions": ["Region"], "tolerances": [...] },
{ "dimensions": ["Region", "Category"], "tolerances": [...] },
{ "dimensions": ["Region", "Category", "Product"], "tolerances": [...] }
]

Stage 0 compares per Region. Any Region that fails is drilled into per Region + Category in stage 1, and failing Region + Category groups are drilled into per Product in stage 2. SEQUENTIAL and PARALLEL stages have no such restriction and may use independent dimension sets.

max_stage_concurrency only applies to PARALLEL and caps how many stages execute at once.

FieldPurpose
notification_emailsAddresses that receive the result email when a run finalises. Validated against your tenant’s recipient rules.
email_detail_levelHow much detail the email includes (e.g. summary only vs. full result data). A per-job override can only lower the workspace/tenant level, never raise it.
max_email_rowsCaps the number of result rows in detailed emails. Capped by the tenant’s max_email_rows.

Exactly one notification email is sent per finalised run. A cancelled run does not send a result email.

DataRecs rejects a job at creation time if any of these are violated:

  • measures must be non-empty.
  • stages must be non-empty.
  • Every dimension and measure name must appear as a key in every connection’s column_mapping. Extra mapping keys are allowed only if they are filter columns.
  • Every tolerance measure_name must reference a defined measure.
  • Under DRILL_DOWN_ON_FAILURE, each stage must include all dimensions of the previous stage.
  • A filter must supply values unless its operator is IS NULL or IS NOT NULL.